From patchwork Thu Mar 26 14:01:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 221790 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E713C43331 for ; Thu, 26 Mar 2020 14:02:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2FCD220748 for ; Thu, 26 Mar 2020 14:02:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Z8Cq8Pis" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728144AbgCZOCq (ORCPT ); Thu, 26 Mar 2020 10:02:46 -0400 Received: from us-smtp-delivery-74.mimecast.com ([63.128.21.74]:38812 "EHLO us-smtp-delivery-74.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728077AbgCZOCm (ORCPT ); Thu, 26 Mar 2020 10:02:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1585231361; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cQAMj9E7pnEO97gxazN0awVnwRs79iDcGQ6g5ZjNfoc=; b=Z8Cq8PisHlfZkd4yF8WtHErTAtmZAmPb6ms75jWCmG2ilieIkeMATNGJD7RNL+mXK4/TR0 NJp3I+pxpoKW29CUMYbPAoD2lMK4Z8yS1bEB1ez9APj5lmYvzy/c2pgHN35yjtT6cSEoKd p4WNx7R832n90e+EtdkPeO1W2pYzY/M= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-78-TzKs58iZNRmrNGCxYJF2Cw-1; Thu, 26 Mar 2020 10:02:36 -0400 X-MC-Unique: TzKs58iZNRmrNGCxYJF2Cw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 89193100FB30; Thu, 26 Mar 2020 14:02:28 +0000 (UTC) Received: from jason-ThinkPad-X1-Carbon-6th.redhat.com (ovpn-12-19.pek2.redhat.com [10.72.12.19]) by smtp.corp.redhat.com (Postfix) with ESMTP id B6EDE60C80; Thu, 26 Mar 2020 14:01:59 +0000 (UTC) From: Jason Wang To: mst@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org Cc: jgg@mellanox.com, maxime.coquelin@redhat.com, cunming.liang@intel.com, zhihong.wang@intel.com, rob.miller@broadcom.com, xiao.w.wang@intel.com, lingshan.zhu@intel.com, eperezma@redhat.com, lulu@redhat.com, parav@mellanox.com, kevin.tian@intel.com, stefanha@redhat.com, rdunlap@infradead.org, hch@infradead.org, aadam@redhat.com, jiri@mellanox.com, shahafs@mellanox.com, hanand@xilinx.com, mhabets@solarflare.com, gdawar@xilinx.com, saugatm@xilinx.com, vmireyno@marvell.com, zhangweining@ruijie.com.cn, Jason Wang Subject: [PATCH V9 1/9] vhost: refine vhost and vringh kconfig Date: Thu, 26 Mar 2020 22:01:17 +0800 Message-Id: <20200326140125.19794-2-jasowang@redhat.com> In-Reply-To: <20200326140125.19794-1-jasowang@redhat.com> References: <20200326140125.19794-1-jasowang@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Currently, CONFIG_VHOST depends on CONFIG_VIRTUALIZATION. But vhost is not necessarily for VM since it's a generic userspace and kernel communication protocol. Such dependency may prevent archs without virtualization support from using vhost. To solve this, a dedicated vhost menu is created under drivers so CONIFG_VHOST can be decoupled out of CONFIG_VIRTUALIZATION. While at it, also squash Kconfig.vringh into vhost Kconfig file. This avoids the trick of conditional inclusion from VOP or CAIF. Then it will be easier to introduce new vringh users and common dependency for both vringh and vhost. Signed-off-by: Jason Wang --- arch/arm/kvm/Kconfig | 2 -- arch/arm64/kvm/Kconfig | 2 -- arch/mips/kvm/Kconfig | 2 -- arch/powerpc/kvm/Kconfig | 2 -- arch/s390/kvm/Kconfig | 4 ---- arch/x86/kvm/Kconfig | 4 ---- drivers/Kconfig | 2 ++ drivers/misc/mic/Kconfig | 4 ---- drivers/net/caif/Kconfig | 4 ---- drivers/vhost/Kconfig | 23 ++++++++++++++--------- drivers/vhost/Kconfig.vringh | 6 ------ 11 files changed, 16 insertions(+), 39 deletions(-) delete mode 100644 drivers/vhost/Kconfig.vringh diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index f591026347a5..be97393761bf 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -54,6 +54,4 @@ config KVM_ARM_HOST ---help--- Provides host support for ARM processors. -source "drivers/vhost/Kconfig" - endif # VIRTUALIZATION diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index a475c68cbfec..449386d76441 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -64,6 +64,4 @@ config KVM_ARM_PMU config KVM_INDIRECT_VECTORS def_bool KVM && (HARDEN_BRANCH_PREDICTOR || HARDEN_EL2_VECTORS) -source "drivers/vhost/Kconfig" - endif # VIRTUALIZATION diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig index eac25aef21e0..b91d145aa2d5 100644 --- a/arch/mips/kvm/Kconfig +++ b/arch/mips/kvm/Kconfig @@ -72,6 +72,4 @@ config KVM_MIPS_DEBUG_COP0_COUNTERS If unsure, say N. -source "drivers/vhost/Kconfig" - endif # VIRTUALIZATION diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 711fca9bc6f0..12885eda324e 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -204,6 +204,4 @@ config KVM_XIVE default y depends on KVM_XICS && PPC_XIVE_NATIVE && KVM_BOOK3S_HV_POSSIBLE -source "drivers/vhost/Kconfig" - endif # VIRTUALIZATION diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig index d3db3d7ed077..def3b60f1fe8 100644 --- a/arch/s390/kvm/Kconfig +++ b/arch/s390/kvm/Kconfig @@ -55,8 +55,4 @@ config KVM_S390_UCONTROL If unsure, say N. -# OK, it's a little counter-intuitive to do this, but it puts it neatly under -# the virtualization menu. -source "drivers/vhost/Kconfig" - endif # VIRTUALIZATION diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 991019d5eee1..0dfe70e17af9 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -94,8 +94,4 @@ config KVM_MMU_AUDIT This option adds a R/W kVM module parameter 'mmu_audit', which allows auditing of KVM MMU events at runtime. -# OK, it's a little counter-intuitive to do this, but it puts it neatly under -# the virtualization menu. -source "drivers/vhost/Kconfig" - endif # VIRTUALIZATION diff --git a/drivers/Kconfig b/drivers/Kconfig index 8befa53f43be..7a6d8b2b68b4 100644 --- a/drivers/Kconfig +++ b/drivers/Kconfig @@ -138,6 +138,8 @@ source "drivers/virt/Kconfig" source "drivers/virtio/Kconfig" +source "drivers/vhost/Kconfig" + source "drivers/hv/Kconfig" source "drivers/xen/Kconfig" diff --git a/drivers/misc/mic/Kconfig b/drivers/misc/mic/Kconfig index b6841ba6d922..8f201d019f5a 100644 --- a/drivers/misc/mic/Kconfig +++ b/drivers/misc/mic/Kconfig @@ -133,8 +133,4 @@ config VOP OS and tools for MIC to use with this driver are available from . -if VOP -source "drivers/vhost/Kconfig.vringh" -endif - endmenu diff --git a/drivers/net/caif/Kconfig b/drivers/net/caif/Kconfig index e74e2bb61236..9db0570c5beb 100644 --- a/drivers/net/caif/Kconfig +++ b/drivers/net/caif/Kconfig @@ -58,8 +58,4 @@ config CAIF_VIRTIO ---help--- The CAIF driver for CAIF over Virtio. -if CAIF_VIRTIO -source "drivers/vhost/Kconfig.vringh" -endif - endif # CAIF_DRIVERS diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index 3d03ccbd1adc..4aef10a54cd1 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -1,8 +1,20 @@ # SPDX-License-Identifier: GPL-2.0-only +config VHOST_RING + tristate + help + This option is selected by any driver which needs to access + the host side of a virtio ring. + +menuconfig VHOST + tristate "Host kernel accelerator for virtio (VHOST)" + help + This option is selected by any driver which needs to access + the core of vhost. +if VHOST + config VHOST_NET tristate "Host kernel accelerator for virtio net" depends on NET && EVENTFD && (TUN || !TUN) && (TAP || !TAP) - select VHOST ---help--- This kernel module can be loaded in host kernel to accelerate guest networking with virtio_net. Not to be confused with virtio_net @@ -14,7 +26,6 @@ config VHOST_NET config VHOST_SCSI tristate "VHOST_SCSI TCM fabric driver" depends on TARGET_CORE && EVENTFD - select VHOST default n ---help--- Say M here to enable the vhost_scsi TCM fabric module @@ -24,7 +35,6 @@ config VHOST_VSOCK tristate "vhost virtio-vsock driver" depends on VSOCKETS && EVENTFD select VIRTIO_VSOCKETS_COMMON - select VHOST default n ---help--- This kernel module can be loaded in the host kernel to provide AF_VSOCK @@ -34,12 +44,6 @@ config VHOST_VSOCK To compile this driver as a module, choose M here: the module will be called vhost_vsock. -config VHOST - tristate - ---help--- - This option is selected by any driver which needs to access - the core of vhost. - config VHOST_CROSS_ENDIAN_LEGACY bool "Cross-endian support for vhost" default n @@ -54,3 +58,4 @@ config VHOST_CROSS_ENDIAN_LEGACY adds some overhead, it is disabled by default. If unsure, say "N". +endif diff --git a/drivers/vhost/Kconfig.vringh b/drivers/vhost/Kconfig.vringh deleted file mode 100644 index c1fe36a9b8d4..000000000000 --- a/drivers/vhost/Kconfig.vringh +++ /dev/null @@ -1,6 +0,0 @@ -# SPDX-License-Identifier: GPL-2.0-only -config VHOST_RING - tristate - ---help--- - This option is selected by any driver which needs to access - the host side of a virtio ring. From patchwork Thu Mar 26 14:01:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 221789 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.9 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78216C2D0E5 for ; Thu, 26 Mar 2020 14:03:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1F5462073E for ; Thu, 26 Mar 2020 14:03:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="fCIPrp0k" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728243AbgCZODT (ORCPT ); Thu, 26 Mar 2020 10:03:19 -0400 Received: from us-smtp-delivery-74.mimecast.com ([63.128.21.74]:49731 "EHLO us-smtp-delivery-74.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727738AbgCZODT (ORCPT ); Thu, 26 Mar 2020 10:03:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1585231396; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dSL0wjijVrYEZDDJ2onQj9ynBTFoz5VXYC2NOe1umXU=; b=fCIPrp0kBKDbLX0hR8u8++5S5l7QNmtkF+pwr0MfrzD4YMNrCe3eIKg0nJ2eLxwBYddhXe rMcPDmEtvnCdxSxVgnyZj9zXNrjYG0ir5ReJo74DwdCt/nGHS69ZAbbgsoZggftkNIAJiG 3moWs3VqlCeGEcWiuqBW9Ic2TYHG5vY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-182-k5C_NDYMOQC-ht0CrmAB2w-1; Thu, 26 Mar 2020 10:03:10 -0400 X-MC-Unique: k5C_NDYMOQC-ht0CrmAB2w-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4B6551137842; Thu, 26 Mar 2020 14:03:07 +0000 (UTC) Received: from jason-ThinkPad-X1-Carbon-6th.redhat.com (ovpn-12-19.pek2.redhat.com [10.72.12.19]) by smtp.corp.redhat.com (Postfix) with ESMTP id 59FD860BF3; Thu, 26 Mar 2020 14:02:48 +0000 (UTC) From: Jason Wang To: mst@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org Cc: jgg@mellanox.com, maxime.coquelin@redhat.com, cunming.liang@intel.com, zhihong.wang@intel.com, rob.miller@broadcom.com, xiao.w.wang@intel.com, lingshan.zhu@intel.com, eperezma@redhat.com, lulu@redhat.com, parav@mellanox.com, kevin.tian@intel.com, stefanha@redhat.com, rdunlap@infradead.org, hch@infradead.org, aadam@redhat.com, jiri@mellanox.com, shahafs@mellanox.com, hanand@xilinx.com, mhabets@solarflare.com, gdawar@xilinx.com, saugatm@xilinx.com, vmireyno@marvell.com, zhangweining@ruijie.com.cn, Jason Wang Subject: [PATCH V9 3/9] vhost: factor out IOTLB Date: Thu, 26 Mar 2020 22:01:19 +0800 Message-Id: <20200326140125.19794-4-jasowang@redhat.com> In-Reply-To: <20200326140125.19794-1-jasowang@redhat.com> References: <20200326140125.19794-1-jasowang@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch factors out IOTLB into a dedicated module in order to be reused by other modules like vringh. User may choose to enable the automatic retiring by specifying VHOST_IOTLB_FLAG_RETIRE flag to fit for the case of vhost device IOTLB implementation. Signed-off-by: Jason Wang --- MAINTAINERS | 1 + drivers/vhost/Kconfig | 9 ++ drivers/vhost/Makefile | 3 + drivers/vhost/iotlb.c | 177 +++++++++++++++++++++++++++++ drivers/vhost/net.c | 2 +- drivers/vhost/vhost.c | 221 +++++++++++------------------------- drivers/vhost/vhost.h | 39 ++----- include/linux/vhost_iotlb.h | 47 ++++++++ 8 files changed, 318 insertions(+), 181 deletions(-) create mode 100644 drivers/vhost/iotlb.c create mode 100644 include/linux/vhost_iotlb.h diff --git a/MAINTAINERS b/MAINTAINERS index c74e4ea714a5..0fb645b5a7df 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -17768,6 +17768,7 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git S: Maintained F: drivers/vhost/ F: include/uapi/linux/vhost.h +F: include/linux/vhost_iotlb.h VIRTIO INPUT DRIVER M: Gerd Hoffmann diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index 4aef10a54cd1..2f6c2105b953 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -1,4 +1,9 @@ # SPDX-License-Identifier: GPL-2.0-only +config VHOST_IOTLB + tristate + help + Generic IOTLB implementation for vhost and vringh. + config VHOST_RING tristate help @@ -7,9 +12,11 @@ config VHOST_RING menuconfig VHOST tristate "Host kernel accelerator for virtio (VHOST)" + select VHOST_IOTLB help This option is selected by any driver which needs to access the core of vhost. + if VHOST config VHOST_NET @@ -58,4 +65,6 @@ config VHOST_CROSS_ENDIAN_LEGACY adds some overhead, it is disabled by default. If unsure, say "N". + endif + diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile index 6c6df24f770c..fb831002bcf0 100644 --- a/drivers/vhost/Makefile +++ b/drivers/vhost/Makefile @@ -11,3 +11,6 @@ vhost_vsock-y := vsock.o obj-$(CONFIG_VHOST_RING) += vringh.o obj-$(CONFIG_VHOST) += vhost.o + +obj-$(CONFIG_VHOST_IOTLB) += vhost_iotlb.o +vhost_iotlb-y := iotlb.o diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c new file mode 100644 index 000000000000..1f0ca6e44410 --- /dev/null +++ b/drivers/vhost/iotlb.c @@ -0,0 +1,177 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (C) 2020 Red Hat, Inc. + * Author: Jason Wang + * + * IOTLB implementation for vhost. + */ +#include +#include +#include + +#define MOD_VERSION "0.1" +#define MOD_DESC "VHOST IOTLB" +#define MOD_AUTHOR "Jason Wang " +#define MOD_LICENSE "GPL v2" + +#define START(map) ((map)->start) +#define LAST(map) ((map)->last) + +INTERVAL_TREE_DEFINE(struct vhost_iotlb_map, + rb, __u64, __subtree_last, + START, LAST, static inline, vhost_iotlb_itree); + +/** + * vhost_iotlb_map_free - remove a map node and free it + * @iotlb: the IOTLB + * @map: the map that want to be remove and freed + */ +void vhost_iotlb_map_free(struct vhost_iotlb *iotlb, + struct vhost_iotlb_map *map) +{ + vhost_iotlb_itree_remove(map, &iotlb->root); + list_del(&map->link); + kfree(map); + iotlb->nmaps--; +} +EXPORT_SYMBOL_GPL(vhost_iotlb_map_free); + +/** + * vhost_iotlb_add_range - add a new range to vhost IOTLB + * @iotlb: the IOTLB + * @start: start of the IOVA range + * @last: last of IOVA range + * @addr: the address that is mapped to @start + * @perm: access permission of this range + * + * Returns an error last is smaller than start or memory allocation + * fails + */ +int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, + u64 start, u64 last, + u64 addr, unsigned int perm) +{ + struct vhost_iotlb_map *map; + + if (last < start) + return -EFAULT; + + if (iotlb->limit && + iotlb->nmaps == iotlb->limit && + iotlb->flags & VHOST_IOTLB_FLAG_RETIRE) { + map = list_first_entry(&iotlb->list, typeof(*map), link); + vhost_iotlb_map_free(iotlb, map); + } + + map = kmalloc(sizeof(*map), GFP_ATOMIC); + if (!map) + return -ENOMEM; + + map->start = start; + map->size = last - start + 1; + map->last = last; + map->addr = addr; + map->perm = perm; + + iotlb->nmaps++; + vhost_iotlb_itree_insert(map, &iotlb->root); + + INIT_LIST_HEAD(&map->link); + list_add_tail(&map->link, &iotlb->list); + + return 0; +} +EXPORT_SYMBOL_GPL(vhost_iotlb_add_range); + +/** + * vring_iotlb_del_range - delete overlapped ranges from vhost IOTLB + * @iotlb: the IOTLB + * @start: start of the IOVA range + * @last: last of IOVA range + */ +void vhost_iotlb_del_range(struct vhost_iotlb *iotlb, u64 start, u64 last) +{ + struct vhost_iotlb_map *map; + + while ((map = vhost_iotlb_itree_iter_first(&iotlb->root, + start, last))) + vhost_iotlb_map_free(iotlb, map); +} +EXPORT_SYMBOL_GPL(vhost_iotlb_del_range); + +/** + * vhost_iotlb_alloc - add a new vhost IOTLB + * @limit: maximum number of IOTLB entries + * @flags: VHOST_IOTLB_FLAG_XXX + * + * Returns an error is memory allocation fails + */ +struct vhost_iotlb *vhost_iotlb_alloc(unsigned int limit, unsigned int flags) +{ + struct vhost_iotlb *iotlb = kzalloc(sizeof(*iotlb), GFP_KERNEL); + + if (!iotlb) + return NULL; + + iotlb->root = RB_ROOT_CACHED; + iotlb->limit = limit; + iotlb->nmaps = 0; + iotlb->flags = flags; + INIT_LIST_HEAD(&iotlb->list); + + return iotlb; +} +EXPORT_SYMBOL_GPL(vhost_iotlb_alloc); + +/** + * vhost_iotlb_reset - reset vhost IOTLB (free all IOTLB entries) + * @iotlb: the IOTLB to be reset + */ +void vhost_iotlb_reset(struct vhost_iotlb *iotlb) +{ + vhost_iotlb_del_range(iotlb, 0ULL, 0ULL - 1); +} +EXPORT_SYMBOL_GPL(vhost_iotlb_reset); + +/** + * vhost_iotlb_free - reset and free vhost IOTLB + * @iotlb: the IOTLB to be freed + */ +void vhost_iotlb_free(struct vhost_iotlb *iotlb) +{ + if (iotlb) { + vhost_iotlb_reset(iotlb); + kfree(iotlb); + } +} +EXPORT_SYMBOL_GPL(vhost_iotlb_free); + +/** + * vhost_iotlb_itree_first - return the first overlapped range + * @iotlb: the IOTLB + * @start: start of IOVA range + * @end: end of IOVA range + */ +struct vhost_iotlb_map * +vhost_iotlb_itree_first(struct vhost_iotlb *iotlb, u64 start, u64 last) +{ + return vhost_iotlb_itree_iter_first(&iotlb->root, start, last); +} +EXPORT_SYMBOL_GPL(vhost_iotlb_itree_first); + +/** + * vhost_iotlb_itree_first - return the next overlapped range + * @iotlb: the IOTLB + * @start: start of IOVA range + * @end: end of IOVA range + */ +struct vhost_iotlb_map * +vhost_iotlb_itree_next(struct vhost_iotlb_map *map, u64 start, u64 last) +{ + return vhost_iotlb_itree_iter_next(map, start, last); +} +EXPORT_SYMBOL_GPL(vhost_iotlb_itree_next); + +MODULE_VERSION(MOD_VERSION); +MODULE_DESCRIPTION(MOD_DESC); +MODULE_AUTHOR(MOD_AUTHOR); +MODULE_LICENSE(MOD_LICENSE); diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index c8ab8d83b530..c7298b192ab4 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -1595,7 +1595,7 @@ static long vhost_net_reset_owner(struct vhost_net *n) struct socket *tx_sock = NULL; struct socket *rx_sock = NULL; long err; - struct vhost_umem *umem; + struct vhost_iotlb *umem; mutex_lock(&n->dev.mutex); err = vhost_dev_check_owner(&n->dev); diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 8e9e2341e40a..d450e16c5c25 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -50,10 +50,6 @@ enum { #define vhost_used_event(vq) ((__virtio16 __user *)&vq->avail->ring[vq->num]) #define vhost_avail_event(vq) ((__virtio16 __user *)&vq->used->ring[vq->num]) -INTERVAL_TREE_DEFINE(struct vhost_umem_node, - rb, __u64, __subtree_last, - START, LAST, static inline, vhost_umem_interval_tree); - #ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY static void vhost_disable_cross_endian(struct vhost_virtqueue *vq) { @@ -584,21 +580,25 @@ long vhost_dev_set_owner(struct vhost_dev *dev) } EXPORT_SYMBOL_GPL(vhost_dev_set_owner); -struct vhost_umem *vhost_dev_reset_owner_prepare(void) +static struct vhost_iotlb *iotlb_alloc(void) +{ + return vhost_iotlb_alloc(max_iotlb_entries, + VHOST_IOTLB_FLAG_RETIRE); +} + +struct vhost_iotlb *vhost_dev_reset_owner_prepare(void) { - return kvzalloc(sizeof(struct vhost_umem), GFP_KERNEL); + return iotlb_alloc(); } EXPORT_SYMBOL_GPL(vhost_dev_reset_owner_prepare); /* Caller should have device mutex */ -void vhost_dev_reset_owner(struct vhost_dev *dev, struct vhost_umem *umem) +void vhost_dev_reset_owner(struct vhost_dev *dev, struct vhost_iotlb *umem) { int i; vhost_dev_cleanup(dev); - /* Restore memory to default empty mapping. */ - INIT_LIST_HEAD(&umem->umem_list); dev->umem = umem; /* We don't need VQ locks below since vhost_dev_cleanup makes sure * VQs aren't running. @@ -621,28 +621,6 @@ void vhost_dev_stop(struct vhost_dev *dev) } EXPORT_SYMBOL_GPL(vhost_dev_stop); -static void vhost_umem_free(struct vhost_umem *umem, - struct vhost_umem_node *node) -{ - vhost_umem_interval_tree_remove(node, &umem->umem_tree); - list_del(&node->link); - kfree(node); - umem->numem--; -} - -static void vhost_umem_clean(struct vhost_umem *umem) -{ - struct vhost_umem_node *node, *tmp; - - if (!umem) - return; - - list_for_each_entry_safe(node, tmp, &umem->umem_list, link) - vhost_umem_free(umem, node); - - kvfree(umem); -} - static void vhost_clear_msg(struct vhost_dev *dev) { struct vhost_msg_node *node, *n; @@ -680,9 +658,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev) eventfd_ctx_put(dev->log_ctx); dev->log_ctx = NULL; /* No one will access memory at this point */ - vhost_umem_clean(dev->umem); + vhost_iotlb_free(dev->umem); dev->umem = NULL; - vhost_umem_clean(dev->iotlb); + vhost_iotlb_free(dev->iotlb); dev->iotlb = NULL; vhost_clear_msg(dev); wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM); @@ -718,27 +696,26 @@ static bool vhost_overflow(u64 uaddr, u64 size) } /* Caller should have vq mutex and device mutex. */ -static bool vq_memory_access_ok(void __user *log_base, struct vhost_umem *umem, +static bool vq_memory_access_ok(void __user *log_base, struct vhost_iotlb *umem, int log_all) { - struct vhost_umem_node *node; + struct vhost_iotlb_map *map; if (!umem) return false; - list_for_each_entry(node, &umem->umem_list, link) { - unsigned long a = node->userspace_addr; + list_for_each_entry(map, &umem->list, link) { + unsigned long a = map->addr; - if (vhost_overflow(node->userspace_addr, node->size)) + if (vhost_overflow(map->addr, map->size)) return false; - if (!access_ok((void __user *)a, - node->size)) + if (!access_ok((void __user *)a, map->size)) return false; else if (log_all && !log_access_ok(log_base, - node->start, - node->size)) + map->start, + map->size)) return false; } return true; @@ -748,17 +725,17 @@ static inline void __user *vhost_vq_meta_fetch(struct vhost_virtqueue *vq, u64 addr, unsigned int size, int type) { - const struct vhost_umem_node *node = vq->meta_iotlb[type]; + const struct vhost_iotlb_map *map = vq->meta_iotlb[type]; - if (!node) + if (!map) return NULL; - return (void *)(uintptr_t)(node->userspace_addr + addr - node->start); + return (void *)(uintptr_t)(map->addr + addr - map->start); } /* Can we switch to this memory table? */ /* Caller should have device mutex but not vq mutex */ -static bool memory_access_ok(struct vhost_dev *d, struct vhost_umem *umem, +static bool memory_access_ok(struct vhost_dev *d, struct vhost_iotlb *umem, int log_all) { int i; @@ -1023,47 +1000,6 @@ static inline int vhost_get_desc(struct vhost_virtqueue *vq, return vhost_copy_from_user(vq, desc, vq->desc + idx, sizeof(*desc)); } -static int vhost_new_umem_range(struct vhost_umem *umem, - u64 start, u64 size, u64 end, - u64 userspace_addr, int perm) -{ - struct vhost_umem_node *tmp, *node; - - if (!size) - return -EFAULT; - - node = kmalloc(sizeof(*node), GFP_ATOMIC); - if (!node) - return -ENOMEM; - - if (umem->numem == max_iotlb_entries) { - tmp = list_first_entry(&umem->umem_list, typeof(*tmp), link); - vhost_umem_free(umem, tmp); - } - - node->start = start; - node->size = size; - node->last = end; - node->userspace_addr = userspace_addr; - node->perm = perm; - INIT_LIST_HEAD(&node->link); - list_add_tail(&node->link, &umem->umem_list); - vhost_umem_interval_tree_insert(node, &umem->umem_tree); - umem->numem++; - - return 0; -} - -static void vhost_del_umem_range(struct vhost_umem *umem, - u64 start, u64 end) -{ - struct vhost_umem_node *node; - - while ((node = vhost_umem_interval_tree_iter_first(&umem->umem_tree, - start, end))) - vhost_umem_free(umem, node); -} - static void vhost_iotlb_notify_vq(struct vhost_dev *d, struct vhost_iotlb_msg *msg) { @@ -1120,9 +1056,9 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev, break; } vhost_vq_meta_reset(dev); - if (vhost_new_umem_range(dev->iotlb, msg->iova, msg->size, - msg->iova + msg->size - 1, - msg->uaddr, msg->perm)) { + if (vhost_iotlb_add_range(dev->iotlb, msg->iova, + msg->iova + msg->size - 1, + msg->uaddr, msg->perm)) { ret = -ENOMEM; break; } @@ -1134,8 +1070,8 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev, break; } vhost_vq_meta_reset(dev); - vhost_del_umem_range(dev->iotlb, msg->iova, - msg->iova + msg->size - 1); + vhost_iotlb_del_range(dev->iotlb, msg->iova, + msg->iova + msg->size - 1); break; default: ret = -EINVAL; @@ -1319,44 +1255,42 @@ static bool vq_access_ok(struct vhost_virtqueue *vq, unsigned int num, } static void vhost_vq_meta_update(struct vhost_virtqueue *vq, - const struct vhost_umem_node *node, + const struct vhost_iotlb_map *map, int type) { int access = (type == VHOST_ADDR_USED) ? VHOST_ACCESS_WO : VHOST_ACCESS_RO; - if (likely(node->perm & access)) - vq->meta_iotlb[type] = node; + if (likely(map->perm & access)) + vq->meta_iotlb[type] = map; } static bool iotlb_access_ok(struct vhost_virtqueue *vq, int access, u64 addr, u64 len, int type) { - const struct vhost_umem_node *node; - struct vhost_umem *umem = vq->iotlb; + const struct vhost_iotlb_map *map; + struct vhost_iotlb *umem = vq->iotlb; u64 s = 0, size, orig_addr = addr, last = addr + len - 1; if (vhost_vq_meta_fetch(vq, addr, len, type)) return true; while (len > s) { - node = vhost_umem_interval_tree_iter_first(&umem->umem_tree, - addr, - last); - if (node == NULL || node->start > addr) { + map = vhost_iotlb_itree_first(umem, addr, last); + if (map == NULL || map->start > addr) { vhost_iotlb_miss(vq, addr, access); return false; - } else if (!(node->perm & access)) { + } else if (!(map->perm & access)) { /* Report the possible access violation by * request another translation from userspace. */ return false; } - size = node->size - addr + node->start; + size = map->size - addr + map->start; if (orig_addr == addr && size >= len) - vhost_vq_meta_update(vq, node, type); + vhost_vq_meta_update(vq, map, type); s += size; addr += size; @@ -1372,12 +1306,12 @@ int vq_meta_prefetch(struct vhost_virtqueue *vq) if (!vq->iotlb) return 1; - return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc, + return iotlb_access_ok(vq, VHOST_MAP_RO, (u64)(uintptr_t)vq->desc, vhost_get_desc_size(vq, num), VHOST_ADDR_DESC) && - iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->avail, + iotlb_access_ok(vq, VHOST_MAP_RO, (u64)(uintptr_t)vq->avail, vhost_get_avail_size(vq, num), VHOST_ADDR_AVAIL) && - iotlb_access_ok(vq, VHOST_ACCESS_WO, (u64)(uintptr_t)vq->used, + iotlb_access_ok(vq, VHOST_MAP_WO, (u64)(uintptr_t)vq->used, vhost_get_used_size(vq, num), VHOST_ADDR_USED); } EXPORT_SYMBOL_GPL(vq_meta_prefetch); @@ -1416,25 +1350,11 @@ bool vhost_vq_access_ok(struct vhost_virtqueue *vq) } EXPORT_SYMBOL_GPL(vhost_vq_access_ok); -static struct vhost_umem *vhost_umem_alloc(void) -{ - struct vhost_umem *umem = kvzalloc(sizeof(*umem), GFP_KERNEL); - - if (!umem) - return NULL; - - umem->umem_tree = RB_ROOT_CACHED; - umem->numem = 0; - INIT_LIST_HEAD(&umem->umem_list); - - return umem; -} - static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m) { struct vhost_memory mem, *newmem; struct vhost_memory_region *region; - struct vhost_umem *newumem, *oldumem; + struct vhost_iotlb *newumem, *oldumem; unsigned long size = offsetof(struct vhost_memory, regions); int i; @@ -1456,7 +1376,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m) return -EFAULT; } - newumem = vhost_umem_alloc(); + newumem = iotlb_alloc(); if (!newumem) { kvfree(newmem); return -ENOMEM; @@ -1465,13 +1385,12 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m) for (region = newmem->regions; region < newmem->regions + mem.nregions; region++) { - if (vhost_new_umem_range(newumem, - region->guest_phys_addr, - region->memory_size, - region->guest_phys_addr + - region->memory_size - 1, - region->userspace_addr, - VHOST_ACCESS_RW)) + if (vhost_iotlb_add_range(newumem, + region->guest_phys_addr, + region->guest_phys_addr + + region->memory_size - 1, + region->userspace_addr, + VHOST_MAP_RW)) goto err; } @@ -1489,11 +1408,11 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m) } kvfree(newmem); - vhost_umem_clean(oldumem); + vhost_iotlb_free(oldumem); return 0; err: - vhost_umem_clean(newumem); + vhost_iotlb_free(newumem); kvfree(newmem); return -EFAULT; } @@ -1734,10 +1653,10 @@ EXPORT_SYMBOL_GPL(vhost_vring_ioctl); int vhost_init_device_iotlb(struct vhost_dev *d, bool enabled) { - struct vhost_umem *niotlb, *oiotlb; + struct vhost_iotlb *niotlb, *oiotlb; int i; - niotlb = vhost_umem_alloc(); + niotlb = iotlb_alloc(); if (!niotlb) return -ENOMEM; @@ -1753,7 +1672,7 @@ int vhost_init_device_iotlb(struct vhost_dev *d, bool enabled) mutex_unlock(&vq->mutex); } - vhost_umem_clean(oiotlb); + vhost_iotlb_free(oiotlb); return 0; } @@ -1883,8 +1802,8 @@ static int log_write(void __user *log_base, static int log_write_hva(struct vhost_virtqueue *vq, u64 hva, u64 len) { - struct vhost_umem *umem = vq->umem; - struct vhost_umem_node *u; + struct vhost_iotlb *umem = vq->umem; + struct vhost_iotlb_map *u; u64 start, end, l, min; int r; bool hit = false; @@ -1894,16 +1813,15 @@ static int log_write_hva(struct vhost_virtqueue *vq, u64 hva, u64 len) /* More than one GPAs can be mapped into a single HVA. So * iterate all possible umems here to be safe. */ - list_for_each_entry(u, &umem->umem_list, link) { - if (u->userspace_addr > hva - 1 + len || - u->userspace_addr - 1 + u->size < hva) + list_for_each_entry(u, &umem->list, link) { + if (u->addr > hva - 1 + len || + u->addr - 1 + u->size < hva) continue; - start = max(u->userspace_addr, hva); - end = min(u->userspace_addr - 1 + u->size, - hva - 1 + len); + start = max(u->addr, hva); + end = min(u->addr - 1 + u->size, hva - 1 + len); l = end - start + 1; r = log_write(vq->log_base, - u->start + start - u->userspace_addr, + u->start + start - u->addr, l); if (r < 0) return r; @@ -2054,9 +1972,9 @@ EXPORT_SYMBOL_GPL(vhost_vq_init_access); static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len, struct iovec iov[], int iov_size, int access) { - const struct vhost_umem_node *node; + const struct vhost_iotlb_map *map; struct vhost_dev *dev = vq->dev; - struct vhost_umem *umem = dev->iotlb ? dev->iotlb : dev->umem; + struct vhost_iotlb *umem = dev->iotlb ? dev->iotlb : dev->umem; struct iovec *_iov; u64 s = 0; int ret = 0; @@ -2068,25 +1986,24 @@ static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len, break; } - node = vhost_umem_interval_tree_iter_first(&umem->umem_tree, - addr, addr + len - 1); - if (node == NULL || node->start > addr) { + map = vhost_iotlb_itree_first(umem, addr, addr + len - 1); + if (map == NULL || map->start > addr) { if (umem != dev->iotlb) { ret = -EFAULT; break; } ret = -EAGAIN; break; - } else if (!(node->perm & access)) { + } else if (!(map->perm & access)) { ret = -EPERM; break; } _iov = iov + ret; - size = node->size - addr + node->start; + size = map->size - addr + map->start; _iov->iov_len = min((u64)len - s, size); _iov->iov_base = (void __user *)(unsigned long) - (node->userspace_addr + addr - node->start); + (map->addr + addr - map->start); s += size; addr += size; ++ret; diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index f9d1a03dd153..181382185bbc 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -12,6 +12,7 @@ #include #include #include +#include struct vhost_work; typedef void (*vhost_work_fn_t)(struct vhost_work *work); @@ -52,27 +53,6 @@ struct vhost_log { u64 len; }; -#define START(node) ((node)->start) -#define LAST(node) ((node)->last) - -struct vhost_umem_node { - struct rb_node rb; - struct list_head link; - __u64 start; - __u64 last; - __u64 size; - __u64 userspace_addr; - __u32 perm; - __u32 flags_padding; - __u64 __subtree_last; -}; - -struct vhost_umem { - struct rb_root_cached umem_tree; - struct list_head umem_list; - int numem; -}; - enum vhost_uaddr_type { VHOST_ADDR_DESC = 0, VHOST_ADDR_AVAIL = 1, @@ -90,7 +70,7 @@ struct vhost_virtqueue { struct vring_desc __user *desc; struct vring_avail __user *avail; struct vring_used __user *used; - const struct vhost_umem_node *meta_iotlb[VHOST_NUM_ADDRS]; + const struct vhost_iotlb_map *meta_iotlb[VHOST_NUM_ADDRS]; struct file *kick; struct eventfd_ctx *call_ctx; struct eventfd_ctx *error_ctx; @@ -128,8 +108,8 @@ struct vhost_virtqueue { struct iovec *indirect; struct vring_used_elem *heads; /* Protected by virtqueue mutex. */ - struct vhost_umem *umem; - struct vhost_umem *iotlb; + struct vhost_iotlb *umem; + struct vhost_iotlb *iotlb; void *private_data; u64 acked_features; u64 acked_backend_features; @@ -164,8 +144,8 @@ struct vhost_dev { struct eventfd_ctx *log_ctx; struct llist_head work_list; struct task_struct *worker; - struct vhost_umem *umem; - struct vhost_umem *iotlb; + struct vhost_iotlb *umem; + struct vhost_iotlb *iotlb; spinlock_t iotlb_lock; struct list_head read_list; struct list_head pending_list; @@ -186,8 +166,8 @@ void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs, long vhost_dev_set_owner(struct vhost_dev *dev); bool vhost_dev_has_owner(struct vhost_dev *dev); long vhost_dev_check_owner(struct vhost_dev *); -struct vhost_umem *vhost_dev_reset_owner_prepare(void); -void vhost_dev_reset_owner(struct vhost_dev *, struct vhost_umem *); +struct vhost_iotlb *vhost_dev_reset_owner_prepare(void); +void vhost_dev_reset_owner(struct vhost_dev *dev, struct vhost_iotlb *iotlb); void vhost_dev_cleanup(struct vhost_dev *); void vhost_dev_stop(struct vhost_dev *); long vhost_dev_ioctl(struct vhost_dev *, unsigned int ioctl, void __user *argp); @@ -233,6 +213,9 @@ ssize_t vhost_chr_write_iter(struct vhost_dev *dev, struct iov_iter *from); int vhost_init_device_iotlb(struct vhost_dev *d, bool enabled); +void vhost_iotlb_map_free(struct vhost_iotlb *iotlb, + struct vhost_iotlb_map *map); + #define vq_err(vq, fmt, ...) do { \ pr_debug(pr_fmt(fmt), ##__VA_ARGS__); \ if ((vq)->error_ctx) \ diff --git a/include/linux/vhost_iotlb.h b/include/linux/vhost_iotlb.h new file mode 100644 index 000000000000..6b09b786a762 --- /dev/null +++ b/include/linux/vhost_iotlb.h @@ -0,0 +1,47 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_VHOST_IOTLB_H +#define _LINUX_VHOST_IOTLB_H + +#include + +struct vhost_iotlb_map { + struct rb_node rb; + struct list_head link; + u64 start; + u64 last; + u64 size; + u64 addr; +#define VHOST_MAP_RO 0x1 +#define VHOST_MAP_WO 0x2 +#define VHOST_MAP_RW 0x3 + u32 perm; + u32 flags_padding; + u64 __subtree_last; +}; + +#define VHOST_IOTLB_FLAG_RETIRE 0x1 + +struct vhost_iotlb { + struct rb_root_cached root; + struct list_head list; + unsigned int limit; + unsigned int nmaps; + unsigned int flags; +}; + +int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, u64 start, u64 last, + u64 addr, unsigned int perm); +void vhost_iotlb_del_range(struct vhost_iotlb *iotlb, u64 start, u64 last); + +struct vhost_iotlb *vhost_iotlb_alloc(unsigned int limit, unsigned int flags); +void vhost_iotlb_free(struct vhost_iotlb *iotlb); +void vhost_iotlb_reset(struct vhost_iotlb *iotlb); + +struct vhost_iotlb_map * +vhost_iotlb_itree_first(struct vhost_iotlb *iotlb, u64 start, u64 last); +struct vhost_iotlb_map * +vhost_iotlb_itree_next(struct vhost_iotlb_map *map, u64 start, u64 last); + +void vhost_iotlb_map_free(struct vhost_iotlb *iotlb, + struct vhost_iotlb_map *map); +#endif From patchwork Thu Mar 26 14:01:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 221788 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CC6BC2D0E5 for ; Thu, 26 Mar 2020 14:04:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2DD45206F6 for ; Thu, 26 Mar 2020 14:04:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Ki/JNSS7" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728362AbgCZOEP (ORCPT ); Thu, 26 Mar 2020 10:04:15 -0400 Received: from us-smtp-delivery-74.mimecast.com ([63.128.21.74]:32300 "EHLO us-smtp-delivery-74.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728142AbgCZOEO (ORCPT ); Thu, 26 Mar 2020 10:04:14 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1585231453; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0A80/fZ+pkCQYcnkCKyC43WtTpczDhthy4abNtOjSew=; b=Ki/JNSS7mg8Gbbbq0M/fugiH9Jlqp6BA+qB6KPGlRv1cHG+0WVx+izW7DzrjM1YiSrUB6R fUAfsNdcmwIuu6QtwwxHqvqMQ/pyMKZWb6VQvsB9E6eEHzfCXJPthhp8U6SE8i1xEiJBtk cadZ6oo7ZXzgZOvWc6Q/CQeBQE0LiU8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-314-HK6_KEH0M4qjEe09uSQqTg-1; Thu, 26 Mar 2020 10:04:07 -0400 X-MC-Unique: HK6_KEH0M4qjEe09uSQqTg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 202D913F6; Thu, 26 Mar 2020 14:04:04 +0000 (UTC) Received: from jason-ThinkPad-X1-Carbon-6th.redhat.com (ovpn-12-19.pek2.redhat.com [10.72.12.19]) by smtp.corp.redhat.com (Postfix) with ESMTP id CA69F60BF3; Thu, 26 Mar 2020 14:03:45 +0000 (UTC) From: Jason Wang To: mst@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org Cc: jgg@mellanox.com, maxime.coquelin@redhat.com, cunming.liang@intel.com, zhihong.wang@intel.com, rob.miller@broadcom.com, xiao.w.wang@intel.com, lingshan.zhu@intel.com, eperezma@redhat.com, lulu@redhat.com, parav@mellanox.com, kevin.tian@intel.com, stefanha@redhat.com, rdunlap@infradead.org, hch@infradead.org, aadam@redhat.com, jiri@mellanox.com, shahafs@mellanox.com, hanand@xilinx.com, mhabets@solarflare.com, gdawar@xilinx.com, saugatm@xilinx.com, vmireyno@marvell.com, zhangweining@ruijie.com.cn, Jason Wang Subject: [PATCH V9 6/9] virtio: introduce a vDPA based transport Date: Thu, 26 Mar 2020 22:01:22 +0800 Message-Id: <20200326140125.19794-7-jasowang@redhat.com> In-Reply-To: <20200326140125.19794-1-jasowang@redhat.com> References: <20200326140125.19794-1-jasowang@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch introduces a vDPA transport for virtio. This is used to use kernel virtio driver to drive the vDPA device that is capable of populating virtqueue directly. A new virtio-vdpa driver will be registered to the vDPA bus, when a new virtio-vdpa device is probed, it will register the device with vdpa based config ops. This means it is a software transport between vDPA driver and vDPA device. The transport was implemented through bus_ops of vDPA parent. Signed-off-by: Jason Wang --- drivers/virtio/Kconfig | 13 ++ drivers/virtio/Makefile | 1 + drivers/virtio/virtio_vdpa.c | 396 +++++++++++++++++++++++++++++++++++ 3 files changed, 410 insertions(+) create mode 100644 drivers/virtio/virtio_vdpa.c diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index 9c4fdb64d9ac..99e424570644 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -43,6 +43,19 @@ config VIRTIO_PCI_LEGACY If unsure, say Y. +config VIRTIO_VDPA + tristate "vDPA driver for virtio devices" + select VDPA + select VIRTIO + help + This driver provides support for virtio based paravirtual + device driver over vDPA bus. For this to be useful, you need + an appropriate vDPA device implementation that operates on a + physical device to allow the datapath of virtio to be + offloaded to hardware. + + If unsure, say M. + config VIRTIO_PMEM tristate "Support for virtio pmem driver" depends on VIRTIO diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile index fdf5eacd0d0a..3407ac03fe60 100644 --- a/drivers/virtio/Makefile +++ b/drivers/virtio/Makefile @@ -6,4 +6,5 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o +obj-$(CONFIG_VIRTIO_VDPA) += virtio_vdpa.o obj-$(CONFIG_VDPA) += vdpa/ diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c new file mode 100644 index 000000000000..c30eb55030be --- /dev/null +++ b/drivers/virtio/virtio_vdpa.c @@ -0,0 +1,396 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * VIRTIO based driver for vDPA device + * + * Copyright (c) 2020, Red Hat. All rights reserved. + * Author: Jason Wang + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define MOD_VERSION "0.1" +#define MOD_AUTHOR "Jason Wang " +#define MOD_DESC "vDPA bus driver for virtio devices" +#define MOD_LICENSE "GPL v2" + +struct virtio_vdpa_device { + struct virtio_device vdev; + struct vdpa_device *vdpa; + u64 features; + + /* The lock to protect virtqueue list */ + spinlock_t lock; + /* List of virtio_vdpa_vq_info */ + struct list_head virtqueues; +}; + +struct virtio_vdpa_vq_info { + /* the actual virtqueue */ + struct virtqueue *vq; + + /* the list node for the virtqueues list */ + struct list_head node; +}; + +static inline struct virtio_vdpa_device * +to_virtio_vdpa_device(struct virtio_device *dev) +{ + return container_of(dev, struct virtio_vdpa_device, vdev); +} + +static struct vdpa_device *vd_get_vdpa(struct virtio_device *vdev) +{ + return to_virtio_vdpa_device(vdev)->vdpa; +} + +static void virtio_vdpa_get(struct virtio_device *vdev, unsigned offset, + void *buf, unsigned len) +{ + struct vdpa_device *vdpa = vd_get_vdpa(vdev); + const struct vdpa_config_ops *ops = vdpa->config; + + ops->get_config(vdpa, offset, buf, len); +} + +static void virtio_vdpa_set(struct virtio_device *vdev, unsigned offset, + const void *buf, unsigned len) +{ + struct vdpa_device *vdpa = vd_get_vdpa(vdev); + const struct vdpa_config_ops *ops = vdpa->config; + + ops->set_config(vdpa, offset, buf, len); +} + +static u32 virtio_vdpa_generation(struct virtio_device *vdev) +{ + struct vdpa_device *vdpa = vd_get_vdpa(vdev); + const struct vdpa_config_ops *ops = vdpa->config; + + if (ops->get_generation) + return ops->get_generation(vdpa); + + return 0; +} + +static u8 virtio_vdpa_get_status(struct virtio_device *vdev) +{ + struct vdpa_device *vdpa = vd_get_vdpa(vdev); + const struct vdpa_config_ops *ops = vdpa->config; + + return ops->get_status(vdpa); +} + +static void virtio_vdpa_set_status(struct virtio_device *vdev, u8 status) +{ + struct vdpa_device *vdpa = vd_get_vdpa(vdev); + const struct vdpa_config_ops *ops = vdpa->config; + + return ops->set_status(vdpa, status); +} + +static void virtio_vdpa_reset(struct virtio_device *vdev) +{ + struct vdpa_device *vdpa = vd_get_vdpa(vdev); + const struct vdpa_config_ops *ops = vdpa->config; + + return ops->set_status(vdpa, 0); +} + +static bool virtio_vdpa_notify(struct virtqueue *vq) +{ + struct vdpa_device *vdpa = vd_get_vdpa(vq->vdev); + const struct vdpa_config_ops *ops = vdpa->config; + + ops->kick_vq(vdpa, vq->index); + + return true; +} + +static irqreturn_t virtio_vdpa_config_cb(void *private) +{ + struct virtio_vdpa_device *vd_dev = private; + + virtio_config_changed(&vd_dev->vdev); + + return IRQ_HANDLED; +} + +static irqreturn_t virtio_vdpa_virtqueue_cb(void *private) +{ + struct virtio_vdpa_vq_info *info = private; + + return vring_interrupt(0, info->vq); +} + +static struct virtqueue * +virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned int index, + void (*callback)(struct virtqueue *vq), + const char *name, bool ctx) +{ + struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev); + struct vdpa_device *vdpa = vd_get_vdpa(vdev); + const struct vdpa_config_ops *ops = vdpa->config; + struct virtio_vdpa_vq_info *info; + struct vdpa_callback cb; + struct virtqueue *vq; + u64 desc_addr, driver_addr, device_addr; + unsigned long flags; + u32 align, num; + int err; + + if (!name) + return NULL; + + /* Queue shouldn't already be set up. */ + if (ops->get_vq_ready(vdpa, index)) + return ERR_PTR(-ENOENT); + + /* Allocate and fill out our active queue description */ + info = kmalloc(sizeof(*info), GFP_KERNEL); + if (!info) + return ERR_PTR(-ENOMEM); + + num = ops->get_vq_num_max(vdpa); + if (num == 0) { + err = -ENOENT; + goto error_new_virtqueue; + } + + /* Create the vring */ + align = ops->get_vq_align(vdpa); + vq = vring_create_virtqueue(index, num, align, vdev, + true, true, ctx, + virtio_vdpa_notify, callback, name); + if (!vq) { + err = -ENOMEM; + goto error_new_virtqueue; + } + + /* Setup virtqueue callback */ + cb.callback = virtio_vdpa_virtqueue_cb; + cb.private = info; + ops->set_vq_cb(vdpa, index, &cb); + ops->set_vq_num(vdpa, index, virtqueue_get_vring_size(vq)); + + desc_addr = virtqueue_get_desc_addr(vq); + driver_addr = virtqueue_get_avail_addr(vq); + device_addr = virtqueue_get_used_addr(vq); + + if (ops->set_vq_address(vdpa, index, + desc_addr, driver_addr, + device_addr)) { + err = -EINVAL; + goto err_vq; + } + + ops->set_vq_ready(vdpa, index, 1); + + vq->priv = info; + info->vq = vq; + + spin_lock_irqsave(&vd_dev->lock, flags); + list_add(&info->node, &vd_dev->virtqueues); + spin_unlock_irqrestore(&vd_dev->lock, flags); + + return vq; + +err_vq: + vring_del_virtqueue(vq); +error_new_virtqueue: + ops->set_vq_ready(vdpa, index, 0); + /* VDPA driver should make sure vq is stopeed here */ + WARN_ON(ops->get_vq_ready(vdpa, index)); + kfree(info); + return ERR_PTR(err); +} + +static void virtio_vdpa_del_vq(struct virtqueue *vq) +{ + struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vq->vdev); + struct vdpa_device *vdpa = vd_dev->vdpa; + const struct vdpa_config_ops *ops = vdpa->config; + struct virtio_vdpa_vq_info *info = vq->priv; + unsigned int index = vq->index; + unsigned long flags; + + spin_lock_irqsave(&vd_dev->lock, flags); + list_del(&info->node); + spin_unlock_irqrestore(&vd_dev->lock, flags); + + /* Select and deactivate the queue */ + ops->set_vq_ready(vdpa, index, 0); + WARN_ON(ops->get_vq_ready(vdpa, index)); + + vring_del_virtqueue(vq); + + kfree(info); +} + +static void virtio_vdpa_del_vqs(struct virtio_device *vdev) +{ + struct virtqueue *vq, *n; + + list_for_each_entry_safe(vq, n, &vdev->vqs, list) + virtio_vdpa_del_vq(vq); +} + +static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned nvqs, + struct virtqueue *vqs[], + vq_callback_t *callbacks[], + const char * const names[], + const bool *ctx, + struct irq_affinity *desc) +{ + struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev); + struct vdpa_device *vdpa = vd_get_vdpa(vdev); + const struct vdpa_config_ops *ops = vdpa->config; + struct vdpa_callback cb; + int i, err, queue_idx = 0; + + for (i = 0; i < nvqs; ++i) { + if (!names[i]) { + vqs[i] = NULL; + continue; + } + + vqs[i] = virtio_vdpa_setup_vq(vdev, queue_idx++, + callbacks[i], names[i], ctx ? + ctx[i] : false); + if (IS_ERR(vqs[i])) { + err = PTR_ERR(vqs[i]); + goto err_setup_vq; + } + } + + cb.callback = virtio_vdpa_config_cb; + cb.private = vd_dev; + ops->set_config_cb(vdpa, &cb); + + return 0; + +err_setup_vq: + virtio_vdpa_del_vqs(vdev); + return err; +} + +static u64 virtio_vdpa_get_features(struct virtio_device *vdev) +{ + struct vdpa_device *vdpa = vd_get_vdpa(vdev); + const struct vdpa_config_ops *ops = vdpa->config; + + return ops->get_features(vdpa); +} + +static int virtio_vdpa_finalize_features(struct virtio_device *vdev) +{ + struct vdpa_device *vdpa = vd_get_vdpa(vdev); + const struct vdpa_config_ops *ops = vdpa->config; + + /* Give virtio_ring a chance to accept features. */ + vring_transport_features(vdev); + + return ops->set_features(vdpa, vdev->features); +} + +static const char *virtio_vdpa_bus_name(struct virtio_device *vdev) +{ + struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev); + struct vdpa_device *vdpa = vd_dev->vdpa; + + return dev_name(&vdpa->dev); +} + +static const struct virtio_config_ops virtio_vdpa_config_ops = { + .get = virtio_vdpa_get, + .set = virtio_vdpa_set, + .generation = virtio_vdpa_generation, + .get_status = virtio_vdpa_get_status, + .set_status = virtio_vdpa_set_status, + .reset = virtio_vdpa_reset, + .find_vqs = virtio_vdpa_find_vqs, + .del_vqs = virtio_vdpa_del_vqs, + .get_features = virtio_vdpa_get_features, + .finalize_features = virtio_vdpa_finalize_features, + .bus_name = virtio_vdpa_bus_name, +}; + +static void virtio_vdpa_release_dev(struct device *_d) +{ + struct virtio_device *vdev = + container_of(_d, struct virtio_device, dev); + struct virtio_vdpa_device *vd_dev = + container_of(vdev, struct virtio_vdpa_device, vdev); + + kfree(vd_dev); +} + +static int virtio_vdpa_probe(struct vdpa_device *vdpa) +{ + const struct vdpa_config_ops *ops = vdpa->config; + struct virtio_vdpa_device *vd_dev, *reg_dev = NULL; + int ret = -EINVAL; + + vd_dev = kzalloc(sizeof(*vd_dev), GFP_KERNEL); + if (!vd_dev) + return -ENOMEM; + + vd_dev->vdev.dev.parent = vdpa_get_dma_dev(vdpa); + vd_dev->vdev.dev.release = virtio_vdpa_release_dev; + vd_dev->vdev.config = &virtio_vdpa_config_ops; + vd_dev->vdpa = vdpa; + INIT_LIST_HEAD(&vd_dev->virtqueues); + spin_lock_init(&vd_dev->lock); + + vd_dev->vdev.id.device = ops->get_device_id(vdpa); + if (vd_dev->vdev.id.device == 0) + goto err; + + vd_dev->vdev.id.vendor = ops->get_vendor_id(vdpa); + ret = register_virtio_device(&vd_dev->vdev); + reg_dev = vd_dev; + if (ret) + goto err; + + vdpa_set_drvdata(vdpa, vd_dev); + + return 0; + +err: + if (reg_dev) + put_device(&vd_dev->vdev.dev); + else + kfree(vd_dev); + return ret; +} + +static void virtio_vdpa_remove(struct vdpa_device *vdpa) +{ + struct virtio_vdpa_device *vd_dev = vdpa_get_drvdata(vdpa); + + unregister_virtio_device(&vd_dev->vdev); +} + +static struct vdpa_driver virtio_vdpa_driver = { + .driver = { + .name = "virtio_vdpa", + }, + .probe = virtio_vdpa_probe, + .remove = virtio_vdpa_remove, +}; + +module_vdpa_driver(virtio_vdpa_driver); + +MODULE_VERSION(MOD_VERSION); +MODULE_LICENSE(MOD_LICENSE); +MODULE_AUTHOR(MOD_AUTHOR); +MODULE_DESCRIPTION(MOD_DESC); From patchwork Thu Mar 26 14:01:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 221787 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 756F5C43331 for ; Thu, 26 Mar 2020 14:04:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 32363206F6 for ; Thu, 26 Mar 2020 14:04:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="d+2RVuy0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728417AbgCZOEn (ORCPT ); Thu, 26 Mar 2020 10:04:43 -0400 Received: from us-smtp-delivery-74.mimecast.com ([216.205.24.74]:38353 "EHLO us-smtp-delivery-74.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728404AbgCZOEn (ORCPT ); Thu, 26 Mar 2020 10:04:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1585231481; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YpUQSbsB13AAHi08LDnNmvGvvhgFQjCjFTmY0uEWiP8=; b=d+2RVuy0fZ06NbFC8ksRKFHv+mTfbmD4Ft2bQB4OFazLmTPBWLuBCvgOdxAGuQrjAsIhos /1AsluKt0GO3pullWbLompC7zIxtiOOT7QNlj0c9c/hubVAwUFavYiKU1FvEJfhdXb6kCc WVr6CdrrCX2HDMMs8hfcMZKZS1jo06k= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-327-DKH2PBfrPya0vBiVdAKiYw-1; Thu, 26 Mar 2020 10:04:37 -0400 X-MC-Unique: DKH2PBfrPya0vBiVdAKiYw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9FEF5107ACC4; Thu, 26 Mar 2020 14:04:34 +0000 (UTC) Received: from jason-ThinkPad-X1-Carbon-6th.redhat.com (ovpn-12-19.pek2.redhat.com [10.72.12.19]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1911C60BF3; Thu, 26 Mar 2020 14:04:19 +0000 (UTC) From: Jason Wang To: mst@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org Cc: jgg@mellanox.com, maxime.coquelin@redhat.com, cunming.liang@intel.com, zhihong.wang@intel.com, rob.miller@broadcom.com, xiao.w.wang@intel.com, lingshan.zhu@intel.com, eperezma@redhat.com, lulu@redhat.com, parav@mellanox.com, kevin.tian@intel.com, stefanha@redhat.com, rdunlap@infradead.org, hch@infradead.org, aadam@redhat.com, jiri@mellanox.com, shahafs@mellanox.com, hanand@xilinx.com, mhabets@solarflare.com, gdawar@xilinx.com, saugatm@xilinx.com, vmireyno@marvell.com, zhangweining@ruijie.com.cn, Jason Wang Subject: [PATCH V9 8/9] vdpasim: vDPA device simulator Date: Thu, 26 Mar 2020 22:01:24 +0800 Message-Id: <20200326140125.19794-9-jasowang@redhat.com> In-Reply-To: <20200326140125.19794-1-jasowang@redhat.com> References: <20200326140125.19794-1-jasowang@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch implements a software vDPA networking device. The datapath is implemented through vringh and workqueue. The device has an on-chip IOMMU which translates IOVA to PA. For kernel virtio drivers, vDPA simulator driver provides dma_ops. For vhost driers, set_map() methods of vdpa_config_ops is implemented to accept mappings from vhost. Currently, vDPA device simulator will loopback TX traffic to RX. So the main use case for the device is vDPA feature testing, prototyping and development. Note, there's no management API implemented, a vDPA device will be registered once the module is probed. We need to handle this in the future development. Signed-off-by: Jason Wang --- drivers/virtio/vdpa/Kconfig | 19 + drivers/virtio/vdpa/Makefile | 1 + drivers/virtio/vdpa/vdpa_sim/Makefile | 2 + drivers/virtio/vdpa/vdpa_sim/vdpa_sim.c | 629 ++++++++++++++++++++++++ 4 files changed, 651 insertions(+) create mode 100644 drivers/virtio/vdpa/vdpa_sim/Makefile create mode 100644 drivers/virtio/vdpa/vdpa_sim/vdpa_sim.c diff --git a/drivers/virtio/vdpa/Kconfig b/drivers/virtio/vdpa/Kconfig index 351617723d12..ae204465964e 100644 --- a/drivers/virtio/vdpa/Kconfig +++ b/drivers/virtio/vdpa/Kconfig @@ -5,3 +5,22 @@ config VDPA Enable this module to support vDPA device that uses a datapath which complies with virtio specifications with vendor specific control path. + +menuconfig VDPA_MENU + bool "VDPA drivers" + default n + +if VDPA_MENU + +config VDPA_SIM + tristate "vDPA device simulator" + depends on RUNTIME_TESTING_MENU + select VDPA + select VHOST_RING + default n + help + vDPA networking device simulator which loop TX traffic back + to RX. This device is used for testing, prototyping and + development of vDPA. + +endif # VDPA_MENU diff --git a/drivers/virtio/vdpa/Makefile b/drivers/virtio/vdpa/Makefile index ee6a35e8a4fb..3814af8e097b 100644 --- a/drivers/virtio/vdpa/Makefile +++ b/drivers/virtio/vdpa/Makefile @@ -1,2 +1,3 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_VDPA) += vdpa.o +obj-$(CONFIG_VDPA_SIM) += vdpa_sim/ diff --git a/drivers/virtio/vdpa/vdpa_sim/Makefile b/drivers/virtio/vdpa/vdpa_sim/Makefile new file mode 100644 index 000000000000..b40278f65e04 --- /dev/null +++ b/drivers/virtio/vdpa/vdpa_sim/Makefile @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0 +obj-$(CONFIG_VDPA_SIM) += vdpa_sim.o diff --git a/drivers/virtio/vdpa/vdpa_sim/vdpa_sim.c b/drivers/virtio/vdpa/vdpa_sim/vdpa_sim.c new file mode 100644 index 000000000000..6e8a0cf2fdeb --- /dev/null +++ b/drivers/virtio/vdpa/vdpa_sim/vdpa_sim.c @@ -0,0 +1,629 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * VDPA networking device simulator. + * + * Copyright (c) 2020, Red Hat Inc. All rights reserved. + * Author: Jason Wang + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define DRV_VERSION "0.1" +#define DRV_AUTHOR "Jason Wang " +#define DRV_DESC "vDPA Device Simulator" +#define DRV_LICENSE "GPL v2" + +struct vdpasim_virtqueue { + struct vringh vring; + struct vringh_kiov iov; + unsigned short head; + bool ready; + u64 desc_addr; + u64 device_addr; + u64 driver_addr; + u32 num; + void *private; + irqreturn_t (*cb)(void *data); +}; + +#define VDPASIM_QUEUE_ALIGN PAGE_SIZE +#define VDPASIM_QUEUE_MAX 256 +#define VDPASIM_DEVICE_ID 0x1 +#define VDPASIM_VENDOR_ID 0 +#define VDPASIM_VQ_NUM 0x2 +#define VDPASIM_NAME "vdpasim-netdev" + +static u64 vdpasim_features = (1ULL << VIRTIO_F_ANY_LAYOUT) | + (1ULL << VIRTIO_F_VERSION_1) | + (1ULL << VIRTIO_F_IOMMU_PLATFORM); + +/* State of each vdpasim device */ +struct vdpasim { + struct vdpa_device vdpa; + struct vdpasim_virtqueue vqs[2]; + struct work_struct work; + /* spinlock to synchronize virtqueue state */ + spinlock_t lock; + struct virtio_net_config config; + struct vhost_iotlb *iommu; + void *buffer; + u32 status; + u32 generation; + u64 features; +}; + +static struct vdpasim *vdpasim_dev; + +static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa) +{ + return container_of(vdpa, struct vdpasim, vdpa); +} + +static struct vdpasim *dev_to_sim(struct device *dev) +{ + struct vdpa_device *vdpa = dev_to_vdpa(dev); + + return vdpa_to_sim(vdpa); +} + +static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx) +{ + struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx]; + int ret; + + ret = vringh_init_iotlb(&vq->vring, vdpasim_features, + VDPASIM_QUEUE_MAX, false, + (struct vring_desc *)(uintptr_t)vq->desc_addr, + (struct vring_avail *) + (uintptr_t)vq->driver_addr, + (struct vring_used *) + (uintptr_t)vq->device_addr); +} + +static void vdpasim_vq_reset(struct vdpasim_virtqueue *vq) +{ + vq->ready = 0; + vq->desc_addr = 0; + vq->driver_addr = 0; + vq->device_addr = 0; + vq->cb = NULL; + vq->private = NULL; + vringh_init_iotlb(&vq->vring, vdpasim_features, VDPASIM_QUEUE_MAX, + false, NULL, NULL, NULL); +} + +static void vdpasim_reset(struct vdpasim *vdpasim) +{ + int i; + + for (i = 0; i < VDPASIM_VQ_NUM; i++) + vdpasim_vq_reset(&vdpasim->vqs[i]); + + vhost_iotlb_reset(vdpasim->iommu); + + vdpasim->features = 0; + vdpasim->status = 0; + ++vdpasim->generation; +} + +static void vdpasim_work(struct work_struct *work) +{ + struct vdpasim *vdpasim = container_of(work, struct + vdpasim, work); + struct vdpasim_virtqueue *txq = &vdpasim->vqs[1]; + struct vdpasim_virtqueue *rxq = &vdpasim->vqs[0]; + size_t read, write, total_write; + int err; + int pkts = 0; + + spin_lock(&vdpasim->lock); + + if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK)) + goto out; + + if (!txq->ready || !rxq->ready) + goto out; + + while (true) { + total_write = 0; + err = vringh_getdesc_iotlb(&txq->vring, &txq->iov, NULL, + &txq->head, GFP_ATOMIC); + if (err <= 0) + break; + + err = vringh_getdesc_iotlb(&rxq->vring, NULL, &rxq->iov, + &rxq->head, GFP_ATOMIC); + if (err <= 0) { + vringh_complete_iotlb(&txq->vring, txq->head, 0); + break; + } + + while (true) { + read = vringh_iov_pull_iotlb(&txq->vring, &txq->iov, + vdpasim->buffer, + PAGE_SIZE); + if (read <= 0) + break; + + write = vringh_iov_push_iotlb(&rxq->vring, &rxq->iov, + vdpasim->buffer, read); + if (write <= 0) + break; + + total_write += write; + } + + /* Make sure data is wrote before advancing index */ + smp_wmb(); + + vringh_complete_iotlb(&txq->vring, txq->head, 0); + vringh_complete_iotlb(&rxq->vring, rxq->head, total_write); + + /* Make sure used is visible before rasing the interrupt. */ + smp_wmb(); + + local_bh_disable(); + if (txq->cb) + txq->cb(txq->private); + if (rxq->cb) + rxq->cb(rxq->private); + local_bh_enable(); + + if (++pkts > 4) { + schedule_work(&vdpasim->work); + goto out; + } + } + +out: + spin_unlock(&vdpasim->lock); +} + +static int dir_to_perm(enum dma_data_direction dir) +{ + int perm = -EFAULT; + + switch (dir) { + case DMA_FROM_DEVICE: + perm = VHOST_MAP_WO; + break; + case DMA_TO_DEVICE: + perm = VHOST_MAP_RO; + break; + case DMA_BIDIRECTIONAL: + perm = VHOST_MAP_RW; + break; + default: + break; + } + + return perm; +} + +static dma_addr_t vdpasim_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, + enum dma_data_direction dir, + unsigned long attrs) +{ + struct vdpasim *vdpasim = dev_to_sim(dev); + struct vhost_iotlb *iommu = vdpasim->iommu; + u64 pa = (page_to_pfn(page) << PAGE_SHIFT) + offset; + int ret, perm = dir_to_perm(dir); + + if (perm < 0) + return DMA_MAPPING_ERROR; + + /* For simplicity, use identical mapping to avoid e.g iova + * allocator. + */ + ret = vhost_iotlb_add_range(iommu, pa, pa + size - 1, + pa, dir_to_perm(dir)); + if (ret) + return DMA_MAPPING_ERROR; + + return (dma_addr_t)(pa); +} + +static void vdpasim_unmap_page(struct device *dev, dma_addr_t dma_addr, + size_t size, enum dma_data_direction dir, + unsigned long attrs) +{ + struct vdpasim *vdpasim = dev_to_sim(dev); + struct vhost_iotlb *iommu = vdpasim->iommu; + + vhost_iotlb_del_range(iommu, (u64)dma_addr, + (u64)dma_addr + size - 1); +} + +static void *vdpasim_alloc_coherent(struct device *dev, size_t size, + dma_addr_t *dma_addr, gfp_t flag, + unsigned long attrs) +{ + struct vdpasim *vdpasim = dev_to_sim(dev); + struct vhost_iotlb *iommu = vdpasim->iommu; + void *addr = kmalloc(size, flag); + int ret; + + if (!addr) + *dma_addr = DMA_MAPPING_ERROR; + else { + u64 pa = virt_to_phys(addr); + + ret = vhost_iotlb_add_range(iommu, (u64)pa, + (u64)pa + size - 1, + pa, VHOST_MAP_RW); + if (ret) { + *dma_addr = DMA_MAPPING_ERROR; + kfree(addr); + addr = NULL; + } else + *dma_addr = (dma_addr_t)pa; + } + + return addr; +} + +static void vdpasim_free_coherent(struct device *dev, size_t size, + void *vaddr, dma_addr_t dma_addr, + unsigned long attrs) +{ + struct vdpasim *vdpasim = dev_to_sim(dev); + struct vhost_iotlb *iommu = vdpasim->iommu; + + vhost_iotlb_del_range(iommu, (u64)dma_addr, + (u64)dma_addr + size - 1); + kfree(phys_to_virt((uintptr_t)dma_addr)); +} + +static const struct dma_map_ops vdpasim_dma_ops = { + .map_page = vdpasim_map_page, + .unmap_page = vdpasim_unmap_page, + .alloc = vdpasim_alloc_coherent, + .free = vdpasim_free_coherent, +}; + +static const struct vdpa_config_ops vdpasim_net_config_ops; + +static struct vdpasim *vdpasim_create(void) +{ + struct virtio_net_config *config; + struct vdpasim *vdpasim; + struct device *dev; + int ret = -ENOMEM; + + vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, + &vdpasim_net_config_ops); + if (!vdpasim) + goto err_alloc; + + INIT_WORK(&vdpasim->work, vdpasim_work); + spin_lock_init(&vdpasim->lock); + + dev = &vdpasim->vdpa.dev; + dev->coherent_dma_mask = DMA_BIT_MASK(64); + set_dma_ops(dev, &vdpasim_dma_ops); + + vdpasim->iommu = vhost_iotlb_alloc(2048, 0); + if (!vdpasim->iommu) + goto err_iommu; + + vdpasim->buffer = kmalloc(PAGE_SIZE, GFP_KERNEL); + if (!vdpasim->buffer) + goto err_iommu; + + config = &vdpasim->config; + config->mtu = 1500; + config->status = VIRTIO_NET_S_LINK_UP; + eth_random_addr(config->mac); + + vringh_set_iotlb(&vdpasim->vqs[0].vring, vdpasim->iommu); + vringh_set_iotlb(&vdpasim->vqs[1].vring, vdpasim->iommu); + + vdpasim->vdpa.dma_dev = dev; + ret = vdpa_register_device(&vdpasim->vdpa); + if (ret) + goto err_iommu; + + return vdpasim; + +err_iommu: + put_device(dev); +err_alloc: + return ERR_PTR(ret); +} + +static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx, + u64 desc_area, u64 driver_area, + u64 device_area) +{ + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx]; + + vq->desc_addr = desc_area; + vq->driver_addr = driver_area; + vq->device_addr = device_area; + + return 0; +} + +static void vdpasim_set_vq_num(struct vdpa_device *vdpa, u16 idx, u32 num) +{ + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx]; + + vq->num = num; +} + +static void vdpasim_kick_vq(struct vdpa_device *vdpa, u16 idx) +{ + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx]; + + if (vq->ready) + schedule_work(&vdpasim->work); +} + +static void vdpasim_set_vq_cb(struct vdpa_device *vdpa, u16 idx, + struct vdpa_callback *cb) +{ + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx]; + + vq->cb = cb->callback; + vq->private = cb->private; +} + +static void vdpasim_set_vq_ready(struct vdpa_device *vdpa, u16 idx, bool ready) +{ + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx]; + + spin_lock(&vdpasim->lock); + vq->ready = ready; + if (vq->ready) + vdpasim_queue_ready(vdpasim, idx); + spin_unlock(&vdpasim->lock); +} + +static bool vdpasim_get_vq_ready(struct vdpa_device *vdpa, u16 idx) +{ + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx]; + + return vq->ready; +} + +static int vdpasim_set_vq_state(struct vdpa_device *vdpa, u16 idx, u64 state) +{ + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx]; + struct vringh *vrh = &vq->vring; + + spin_lock(&vdpasim->lock); + vrh->last_avail_idx = state; + spin_unlock(&vdpasim->lock); + + return 0; +} + +static u64 vdpasim_get_vq_state(struct vdpa_device *vdpa, u16 idx) +{ + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx]; + struct vringh *vrh = &vq->vring; + + return vrh->last_avail_idx; +} + +static u16 vdpasim_get_vq_align(struct vdpa_device *vdpa) +{ + return VDPASIM_QUEUE_ALIGN; +} + +static u64 vdpasim_get_features(struct vdpa_device *vdpa) +{ + return vdpasim_features; +} + +static int vdpasim_set_features(struct vdpa_device *vdpa, u64 features) +{ + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + + /* DMA mapping must be done by driver */ + if (!(features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))) + return -EINVAL; + + vdpasim->features = features & vdpasim_features; + + return 0; +} + +static void vdpasim_set_config_cb(struct vdpa_device *vdpa, + struct vdpa_callback *cb) +{ + /* We don't support config interrupt */ +} + +static u16 vdpasim_get_vq_num_max(struct vdpa_device *vdpa) +{ + return VDPASIM_QUEUE_MAX; +} + +static u32 vdpasim_get_device_id(struct vdpa_device *vdpa) +{ + return VDPASIM_DEVICE_ID; +} + +static u32 vdpasim_get_vendor_id(struct vdpa_device *vdpa) +{ + return VDPASIM_VENDOR_ID; +} + +static u8 vdpasim_get_status(struct vdpa_device *vdpa) +{ + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + u8 status; + + spin_lock(&vdpasim->lock); + status = vdpasim->status; + spin_unlock(&vdpasim->lock); + + return vdpasim->status; +} + +static void vdpasim_set_status(struct vdpa_device *vdpa, u8 status) +{ + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + + spin_lock(&vdpasim->lock); + vdpasim->status = status; + if (status == 0) + vdpasim_reset(vdpasim); + spin_unlock(&vdpasim->lock); +} + +static void vdpasim_get_config(struct vdpa_device *vdpa, unsigned int offset, + void *buf, unsigned int len) +{ + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + + if (offset + len < sizeof(struct virtio_net_config)) + memcpy(buf, &vdpasim->config + offset, len); +} + +static void vdpasim_set_config(struct vdpa_device *vdpa, unsigned int offset, + const void *buf, unsigned int len) +{ + /* No writable config supportted by vdpasim */ +} + +static u32 vdpasim_get_generation(struct vdpa_device *vdpa) +{ + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + + return vdpasim->generation; +} + +static int vdpasim_set_map(struct vdpa_device *vdpa, + struct vhost_iotlb *iotlb) +{ + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + struct vhost_iotlb_map *map; + u64 start = 0ULL, last = 0ULL - 1; + int ret; + + vhost_iotlb_reset(vdpasim->iommu); + + for (map = vhost_iotlb_itree_first(iotlb, start, last); map; + map = vhost_iotlb_itree_next(map, start, last)) { + ret = vhost_iotlb_add_range(vdpasim->iommu, map->start, + map->last, map->addr, map->perm); + if (ret) + goto err; + } + return 0; + +err: + vhost_iotlb_reset(vdpasim->iommu); + return ret; +} + +static int vdpasim_dma_map(struct vdpa_device *vdpa, u64 iova, u64 size, + u64 pa, u32 perm) +{ + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + + return vhost_iotlb_add_range(vdpasim->iommu, iova, + iova + size - 1, pa, perm); +} + +static int vdpasim_dma_unmap(struct vdpa_device *vdpa, u64 iova, u64 size) +{ + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + + vhost_iotlb_del_range(vdpasim->iommu, iova, iova + size - 1); + + return 0; +} + +static void vdpasim_free(struct vdpa_device *vdpa) +{ + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + + cancel_work_sync(&vdpasim->work); + kfree(vdpasim->buffer); + if (vdpasim->iommu) + vhost_iotlb_free(vdpasim->iommu); +} + +static const struct vdpa_config_ops vdpasim_net_config_ops = { + .set_vq_address = vdpasim_set_vq_address, + .set_vq_num = vdpasim_set_vq_num, + .kick_vq = vdpasim_kick_vq, + .set_vq_cb = vdpasim_set_vq_cb, + .set_vq_ready = vdpasim_set_vq_ready, + .get_vq_ready = vdpasim_get_vq_ready, + .set_vq_state = vdpasim_set_vq_state, + .get_vq_state = vdpasim_get_vq_state, + .get_vq_align = vdpasim_get_vq_align, + .get_features = vdpasim_get_features, + .set_features = vdpasim_set_features, + .set_config_cb = vdpasim_set_config_cb, + .get_vq_num_max = vdpasim_get_vq_num_max, + .get_device_id = vdpasim_get_device_id, + .get_vendor_id = vdpasim_get_vendor_id, + .get_status = vdpasim_get_status, + .set_status = vdpasim_set_status, + .get_config = vdpasim_get_config, + .set_config = vdpasim_set_config, + .get_generation = vdpasim_get_generation, + .set_map = vdpasim_set_map, + .dma_map = vdpasim_dma_map, + .dma_unmap = vdpasim_dma_unmap, + .free = vdpasim_free, +}; + +static int __init vdpasim_dev_init(void) +{ + vdpasim_dev = vdpasim_create(); + + if (!IS_ERR(vdpasim_dev)) + return 0; + + return PTR_ERR(vdpasim_dev); +} + +static void __exit vdpasim_dev_exit(void) +{ + struct vdpa_device *vdpa = &vdpasim_dev->vdpa; + + vdpa_unregister_device(vdpa); +} + +module_init(vdpasim_dev_init) +module_exit(vdpasim_dev_exit) + +MODULE_VERSION(DRV_VERSION); +MODULE_LICENSE(DRV_LICENSE); +MODULE_AUTHOR(DRV_AUTHOR); +MODULE_DESCRIPTION(DRV_DESC);