From patchwork Tue Apr 28 19:57:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 283845 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.5 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, FUZZY_XPILL, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1AD12C83000 for ; Tue, 28 Apr 2020 19:58:57 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C21802137B for ; Tue, 28 Apr 2020 19:58:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="hbAr0bAW" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C21802137B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:48300 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jTWNb-0003ip-U6 for qemu-devel@archiver.kernel.org; Tue, 28 Apr 2020 15:58:55 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:35366) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jTWLy-0002Co-SV for qemu-devel@nongnu.org; Tue, 28 Apr 2020 15:57:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.90_1) (envelope-from ) id 1jTWLx-00072e-Ph for qemu-devel@nongnu.org; Tue, 28 Apr 2020 15:57:14 -0400 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:31344 helo=us-smtp-delivery-1.mimecast.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1jTWLx-00072X-9A for qemu-devel@nongnu.org; Tue, 28 Apr 2020 15:57:13 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1588103832; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qJdgYwFSlwdHUNaoVehjCz9GoCi7XawCN5I1LzlAX2w=; b=hbAr0bAWkxj0SRu0z/lBLt3F0WNxPg/EkzvNitjl16wVAchSTDLhr5Ehw0kWr11CPSRsjZ odxyAIqc/9s7n0ia+sFf35y+NSgZl3pOSOyJ603jq+Pn/ZCuTKSy3nn7oEtu5v7hGcDnX0 TTXvuO14CZKRN9S971j6LtkKOb4rjoY= Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-495-BGbm1iD1MO-KFfWp09h4ZQ-1; Tue, 28 Apr 2020 15:57:10 -0400 X-MC-Unique: BGbm1iD1MO-KFfWp09h4ZQ-1 Received: by mail-qk1-f197.google.com with SMTP id i2so24624821qkl.5 for ; Tue, 28 Apr 2020 12:57:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZTDlqQGMB0hy/E/6KeQt4WBQ59Rjf/nY/tOzvJqacz0=; b=dv3At8b+F4NqBDr0qhTCiLStt3MEECtSAVGe69tyrZao8sPhh0qd8JjluIttUTPYEu 4vMHfYFJEnmilGPC4hD8ViM6Zd8Kp3a5JtLk3png8wIi3YLn95AG8NKRyVN0fRCNrx7Y u1BbCVvaH9NSgPFtqXwXiBaDXEwh6VRW4t2Umoo32s44CxAJ0My2wPyQ87u1O2L9MuvS MPw3qbnT9mJeDkXL+DC6Wgufn3yG9884PDzDAEGjpQD8Rfzt4rSKyCGntm97YAHoYQgc lD9NVfDzJ3qlCzxcBK0haNZGIHNFz9ER8cTpGK/5G10vYR1lK8fbgeE/Zvr941wv+UH0 xaiQ== X-Gm-Message-State: AGi0PuZCcMjOLHXsAfyN27eXOYBaIVZVXSAtr7iHP/nog9tY41Ge02uw WSlHsibe5Qw0gczWNDHViamavgpIltYQpc0zDa46qMOvQbcllXHlV8ilyKrRcu1GJwp196iML9g 8fKv9P4Z6EO5sa78= X-Received: by 2002:a37:a090:: with SMTP id j138mr30009614qke.168.1588103829228; Tue, 28 Apr 2020 12:57:09 -0700 (PDT) X-Google-Smtp-Source: APiQypLyZKyg+Db+bdxaGQLXMFG+Dr2ul6n4LdaGhORnFGFqEjq0QdpGM4q1TmnJZu7SFNPhfr1g2A== X-Received: by 2002:a37:a090:: with SMTP id j138mr30009586qke.168.1588103828678; Tue, 28 Apr 2020 12:57:08 -0700 (PDT) Received: from xz-x1.redhat.com ([2607:9880:19c0:32::2]) by smtp.gmail.com with ESMTPSA id s50sm14745776qtj.1.2020.04.28.12.57.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Apr 2020 12:57:07 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Subject: [PATCH RFC v2 2/9] linux-headers: Update Date: Tue, 28 Apr 2020 15:57:00 -0400 Message-Id: <20200428195707.11980-3-peterx@redhat.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200428195707.11980-1-peterx@redhat.com> References: <20200428195707.11980-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=207.211.31.81; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-1.mimecast.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/04/28 04:11:46 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 207.211.31.81 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Paolo Bonzini , "Dr . David Alan Gilbert" , peterx@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Peter Xu --- include/standard-headers/linux/ethtool.h | 10 +- .../linux/input-event-codes.h | 5 +- include/standard-headers/linux/pci_regs.h | 2 + .../standard-headers/linux/virtio_balloon.h | 1 + include/standard-headers/linux/virtio_ids.h | 1 + linux-headers/COPYING | 2 + linux-headers/asm-x86/kvm.h | 2 + linux-headers/asm-x86/unistd_32.h | 1 + linux-headers/asm-x86/unistd_64.h | 1 + linux-headers/asm-x86/unistd_x32.h | 1 + linux-headers/linux/kvm.h | 100 +++++++++++++++++- linux-headers/linux/mman.h | 5 +- linux-headers/linux/userfaultfd.h | 40 +++++-- linux-headers/linux/vfio.h | 37 +++++++ 14 files changed, 195 insertions(+), 13 deletions(-) diff --git a/include/standard-headers/linux/ethtool.h b/include/standard-headers/linux/ethtool.h index 8adf3b018b..1200890c86 100644 --- a/include/standard-headers/linux/ethtool.h +++ b/include/standard-headers/linux/ethtool.h @@ -596,6 +596,9 @@ struct ethtool_pauseparam { * @ETH_SS_LINK_MODES: link mode names * @ETH_SS_MSG_CLASSES: debug message class names * @ETH_SS_WOL_MODES: wake-on-lan modes + * @ETH_SS_SOF_TIMESTAMPING: SOF_TIMESTAMPING_* flags + * @ETH_SS_TS_TX_TYPES: timestamping Tx types + * @ETH_SS_TS_RX_FILTERS: timestamping Rx filters */ enum ethtool_stringset { ETH_SS_TEST = 0, @@ -610,6 +613,9 @@ enum ethtool_stringset { ETH_SS_LINK_MODES, ETH_SS_MSG_CLASSES, ETH_SS_WOL_MODES, + ETH_SS_SOF_TIMESTAMPING, + ETH_SS_TS_TX_TYPES, + ETH_SS_TS_RX_FILTERS, /* add new constants above here */ ETH_SS_COUNT @@ -1330,6 +1336,7 @@ enum ethtool_fec_config_bits { ETHTOOL_FEC_OFF_BIT, ETHTOOL_FEC_RS_BIT, ETHTOOL_FEC_BASER_BIT, + ETHTOOL_FEC_LLRS_BIT, }; #define ETHTOOL_FEC_NONE (1 << ETHTOOL_FEC_NONE_BIT) @@ -1337,6 +1344,7 @@ enum ethtool_fec_config_bits { #define ETHTOOL_FEC_OFF (1 << ETHTOOL_FEC_OFF_BIT) #define ETHTOOL_FEC_RS (1 << ETHTOOL_FEC_RS_BIT) #define ETHTOOL_FEC_BASER (1 << ETHTOOL_FEC_BASER_BIT) +#define ETHTOOL_FEC_LLRS (1 << ETHTOOL_FEC_LLRS_BIT) /* CMDs currently supported */ #define ETHTOOL_GSET 0x00000001 /* DEPRECATED, Get settings. @@ -1521,7 +1529,7 @@ enum ethtool_link_mode_bit_indices { ETHTOOL_LINK_MODE_400000baseLR8_ER8_FR8_Full_BIT = 71, ETHTOOL_LINK_MODE_400000baseDR8_Full_BIT = 72, ETHTOOL_LINK_MODE_400000baseCR8_Full_BIT = 73, - + ETHTOOL_LINK_MODE_FEC_LLRS_BIT = 74, /* must be last entry */ __ETHTOOL_LINK_MODE_MASK_NBITS }; diff --git a/include/standard-headers/linux/input-event-codes.h b/include/standard-headers/linux/input-event-codes.h index b484c25289..ebf72c1031 100644 --- a/include/standard-headers/linux/input-event-codes.h +++ b/include/standard-headers/linux/input-event-codes.h @@ -1,4 +1,4 @@ -/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */ /* * Input event codes * @@ -652,6 +652,9 @@ /* Electronic privacy screen control */ #define KEY_PRIVACY_SCREEN_TOGGLE 0x279 +/* Select an area of screen to be copied */ +#define KEY_SELECTIVE_SCREENSHOT 0x27a + /* * Some keyboards have keys which do not have a defined meaning, these keys * are intended to be programmed / bound to macros by the user. For most diff --git a/include/standard-headers/linux/pci_regs.h b/include/standard-headers/linux/pci_regs.h index 5437690483..f9701410d3 100644 --- a/include/standard-headers/linux/pci_regs.h +++ b/include/standard-headers/linux/pci_regs.h @@ -605,6 +605,7 @@ #define PCI_EXP_SLTCTL_PWR_OFF 0x0400 /* Power Off */ #define PCI_EXP_SLTCTL_EIC 0x0800 /* Electromechanical Interlock Control */ #define PCI_EXP_SLTCTL_DLLSCE 0x1000 /* Data Link Layer State Changed Enable */ +#define PCI_EXP_SLTCTL_IBPD_DISABLE 0x4000 /* In-band PD disable */ #define PCI_EXP_SLTSTA 26 /* Slot Status */ #define PCI_EXP_SLTSTA_ABP 0x0001 /* Attention Button Pressed */ #define PCI_EXP_SLTSTA_PFD 0x0002 /* Power Fault Detected */ @@ -680,6 +681,7 @@ #define PCI_EXP_LNKSTA2 50 /* Link Status 2 */ #define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 52 /* v2 endpoints with link end here */ #define PCI_EXP_SLTCAP2 52 /* Slot Capabilities 2 */ +#define PCI_EXP_SLTCAP2_IBPD 0x00000001 /* In-band PD Disable Supported */ #define PCI_EXP_SLTCTL2 56 /* Slot Control 2 */ #define PCI_EXP_SLTSTA2 58 /* Slot Status 2 */ diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h index 9375ca2a70..1c5f6d6f2d 100644 --- a/include/standard-headers/linux/virtio_balloon.h +++ b/include/standard-headers/linux/virtio_balloon.h @@ -36,6 +36,7 @@ #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */ #define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */ #define VIRTIO_BALLOON_F_PAGE_POISON 4 /* Guest is using page poisoning */ +#define VIRTIO_BALLOON_F_REPORTING 5 /* Page reporting virtqueue */ /* Size of a PFN in the balloon interface. */ #define VIRTIO_BALLOON_PFN_SHIFT 12 diff --git a/include/standard-headers/linux/virtio_ids.h b/include/standard-headers/linux/virtio_ids.h index 585e07b273..ecc27a1740 100644 --- a/include/standard-headers/linux/virtio_ids.h +++ b/include/standard-headers/linux/virtio_ids.h @@ -46,5 +46,6 @@ #define VIRTIO_ID_IOMMU 23 /* virtio IOMMU */ #define VIRTIO_ID_FS 26 /* virtio filesystem */ #define VIRTIO_ID_PMEM 27 /* virtio pmem */ +#define VIRTIO_ID_MAC80211_HWSIM 29 /* virtio mac80211-hwsim */ #endif /* _LINUX_VIRTIO_IDS_H */ diff --git a/linux-headers/COPYING b/linux-headers/COPYING index da4cb28feb..a635a38ef9 100644 --- a/linux-headers/COPYING +++ b/linux-headers/COPYING @@ -16,3 +16,5 @@ In addition, other licenses may also apply. Please see: Documentation/process/license-rules.rst for more details. + +All contributions to the Linux Kernel are subject to this COPYING file. diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h index 503d3f42da..99b15ce39e 100644 --- a/linux-headers/asm-x86/kvm.h +++ b/linux-headers/asm-x86/kvm.h @@ -12,6 +12,7 @@ #define KVM_PIO_PAGE_OFFSET 1 #define KVM_COALESCED_MMIO_PAGE_OFFSET 2 +#define KVM_DIRTY_LOG_PAGE_OFFSET 64 #define DE_VECTOR 0 #define DB_VECTOR 1 @@ -390,6 +391,7 @@ struct kvm_sync_regs { #define KVM_STATE_NESTED_GUEST_MODE 0x00000001 #define KVM_STATE_NESTED_RUN_PENDING 0x00000002 #define KVM_STATE_NESTED_EVMCS 0x00000004 +#define KVM_STATE_NESTED_MTF_PENDING 0x00000008 #define KVM_STATE_NESTED_SMM_GUEST_MODE 0x00000001 #define KVM_STATE_NESTED_SMM_VMXON 0x00000002 diff --git a/linux-headers/asm-x86/unistd_32.h b/linux-headers/asm-x86/unistd_32.h index f6e06fcfbd..1e6c1a5867 100644 --- a/linux-headers/asm-x86/unistd_32.h +++ b/linux-headers/asm-x86/unistd_32.h @@ -429,4 +429,5 @@ #define __NR_openat2 437 #define __NR_pidfd_getfd 438 + #endif /* _ASM_X86_UNISTD_32_H */ diff --git a/linux-headers/asm-x86/unistd_64.h b/linux-headers/asm-x86/unistd_64.h index 924f826d2d..6daf0aecb2 100644 --- a/linux-headers/asm-x86/unistd_64.h +++ b/linux-headers/asm-x86/unistd_64.h @@ -351,4 +351,5 @@ #define __NR_openat2 437 #define __NR_pidfd_getfd 438 + #endif /* _ASM_X86_UNISTD_64_H */ diff --git a/linux-headers/asm-x86/unistd_x32.h b/linux-headers/asm-x86/unistd_x32.h index 010307757b..e3f17ef370 100644 --- a/linux-headers/asm-x86/unistd_x32.h +++ b/linux-headers/asm-x86/unistd_x32.h @@ -340,4 +340,5 @@ #define __NR_preadv2 (__X32_SYSCALL_BIT + 546) #define __NR_pwritev2 (__X32_SYSCALL_BIT + 547) + #endif /* _ASM_X86_UNISTD_X32_H */ diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index 265099100e..f0f3cecce1 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -236,6 +236,7 @@ struct kvm_hyperv_exit { #define KVM_EXIT_IOAPIC_EOI 26 #define KVM_EXIT_HYPERV 27 #define KVM_EXIT_ARM_NISV 28 +#define KVM_EXIT_DIRTY_RING_FULL 29 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -474,12 +475,17 @@ struct kvm_s390_mem_op { __u32 size; /* amount of bytes */ __u32 op; /* type of operation */ __u64 buf; /* buffer in userspace */ - __u8 ar; /* the access register number */ - __u8 reserved[31]; /* should be set to 0 */ + union { + __u8 ar; /* the access register number */ + __u32 sida_offset; /* offset into the sida */ + __u8 reserved[32]; /* should be set to 0 */ + }; }; /* types for kvm_s390_mem_op->op */ #define KVM_S390_MEMOP_LOGICAL_READ 0 #define KVM_S390_MEMOP_LOGICAL_WRITE 1 +#define KVM_S390_MEMOP_SIDA_READ 2 +#define KVM_S390_MEMOP_SIDA_WRITE 3 /* flags for kvm_s390_mem_op->flags */ #define KVM_S390_MEMOP_F_CHECK_ONLY (1ULL << 0) #define KVM_S390_MEMOP_F_INJECT_EXCEPTION (1ULL << 1) @@ -1010,6 +1016,9 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_ARM_NISV_TO_USER 177 #define KVM_CAP_ARM_INJECT_EXT_DABT 178 #define KVM_CAP_S390_VCPU_RESETS 179 +#define KVM_CAP_S390_PROTECTED 180 +#define KVM_CAP_PPC_SECURE_GUEST 181 +#define KVM_CAP_DIRTY_LOG_RING 182 #ifdef KVM_CAP_IRQ_ROUTING @@ -1478,6 +1487,42 @@ struct kvm_enc_region { #define KVM_S390_NORMAL_RESET _IO(KVMIO, 0xc3) #define KVM_S390_CLEAR_RESET _IO(KVMIO, 0xc4) +struct kvm_s390_pv_sec_parm { + __u64 origin; + __u64 length; +}; + +struct kvm_s390_pv_unp { + __u64 addr; + __u64 size; + __u64 tweak; +}; + +enum pv_cmd_id { + KVM_PV_ENABLE, + KVM_PV_DISABLE, + KVM_PV_SET_SEC_PARMS, + KVM_PV_UNPACK, + KVM_PV_VERIFY, + KVM_PV_PREP_RESET, + KVM_PV_UNSHARE_ALL, +}; + +struct kvm_pv_cmd { + __u32 cmd; /* Command to be executed */ + __u16 rc; /* Ultravisor return code */ + __u16 rrc; /* Ultravisor return reason code */ + __u64 data; /* Data or address */ + __u32 flags; /* flags for future extensions. Must be 0 for now */ + __u32 reserved[3]; +}; + +/* Available with KVM_CAP_S390_PROTECTED */ +#define KVM_S390_PV_COMMAND _IOWR(KVMIO, 0xc5, struct kvm_pv_cmd) + +/* Available with KVM_CAP_DIRTY_LOG_RING */ +#define KVM_RESET_DIRTY_RINGS _IO(KVMIO, 0xc6) + /* Secure Encrypted Virtualization command */ enum sev_cmd_id { /* Guest initialization commands */ @@ -1628,4 +1673,55 @@ struct kvm_hyperv_eventfd { #define KVM_HYPERV_CONN_ID_MASK 0x00ffffff #define KVM_HYPERV_EVENTFD_DEASSIGN (1 << 0) +#define KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (1 << 0) +#define KVM_DIRTY_LOG_INITIALLY_SET (1 << 1) + +/* + * Arch needs to define the macro after implementing the dirty ring + * feature. KVM_DIRTY_LOG_PAGE_OFFSET should be defined as the + * starting page offset of the dirty ring structures. + */ +#ifndef KVM_DIRTY_LOG_PAGE_OFFSET +#define KVM_DIRTY_LOG_PAGE_OFFSET 0 +#endif + +/* + * KVM dirty GFN flags, defined as: + * + * |---------------+---------------+--------------| + * | bit 1 (reset) | bit 0 (dirty) | Status | + * |---------------+---------------+--------------| + * | 0 | 0 | Invalid GFN | + * | 0 | 1 | Dirty GFN | + * | 1 | X | GFN to reset | + * |---------------+---------------+--------------| + * + * Lifecycle of a dirty GFN goes like: + * + * dirtied collected reset + * 00 -----------> 01 -------------> 1X -------+ + * ^ | + * | | + * +------------------------------------------+ + * + * The userspace program is only responsible for the 01->1X state + * conversion (to collect dirty bits). Also, it must not skip any + * dirty bits so that dirty bits are always collected in sequence. + */ +#define KVM_DIRTY_GFN_F_DIRTY BIT(0) +#define KVM_DIRTY_GFN_F_RESET BIT(1) +#define KVM_DIRTY_GFN_F_MASK 0x3 + +/* + * KVM dirty rings should be mapped at KVM_DIRTY_LOG_PAGE_OFFSET of + * per-vcpu mmaped regions as an array of struct kvm_dirty_gfn. The + * size of the gfn buffer is decided by the first argument when + * enabling KVM_CAP_DIRTY_LOG_RING. + */ +struct kvm_dirty_gfn { + __u32 flags; + __u32 slot; + __u64 offset; +}; + #endif /* __LINUX_KVM_H */ diff --git a/linux-headers/linux/mman.h b/linux-headers/linux/mman.h index 1f6e2cd89c..51ea363759 100644 --- a/linux-headers/linux/mman.h +++ b/linux-headers/linux/mman.h @@ -5,8 +5,9 @@ #include #include -#define MREMAP_MAYMOVE 1 -#define MREMAP_FIXED 2 +#define MREMAP_MAYMOVE 1 +#define MREMAP_FIXED 2 +#define MREMAP_DONTUNMAP 4 #define OVERCOMMIT_GUESS 0 #define OVERCOMMIT_ALWAYS 1 diff --git a/linux-headers/linux/userfaultfd.h b/linux-headers/linux/userfaultfd.h index ce78878d12..8d3996eb82 100644 --- a/linux-headers/linux/userfaultfd.h +++ b/linux-headers/linux/userfaultfd.h @@ -19,7 +19,8 @@ * means the userland is reading). */ #define UFFD_API ((__u64)0xAA) -#define UFFD_API_FEATURES (UFFD_FEATURE_EVENT_FORK | \ +#define UFFD_API_FEATURES (UFFD_FEATURE_PAGEFAULT_FLAG_WP | \ + UFFD_FEATURE_EVENT_FORK | \ UFFD_FEATURE_EVENT_REMAP | \ UFFD_FEATURE_EVENT_REMOVE | \ UFFD_FEATURE_EVENT_UNMAP | \ @@ -34,7 +35,8 @@ #define UFFD_API_RANGE_IOCTLS \ ((__u64)1 << _UFFDIO_WAKE | \ (__u64)1 << _UFFDIO_COPY | \ - (__u64)1 << _UFFDIO_ZEROPAGE) + (__u64)1 << _UFFDIO_ZEROPAGE | \ + (__u64)1 << _UFFDIO_WRITEPROTECT) #define UFFD_API_RANGE_IOCTLS_BASIC \ ((__u64)1 << _UFFDIO_WAKE | \ (__u64)1 << _UFFDIO_COPY) @@ -52,6 +54,7 @@ #define _UFFDIO_WAKE (0x02) #define _UFFDIO_COPY (0x03) #define _UFFDIO_ZEROPAGE (0x04) +#define _UFFDIO_WRITEPROTECT (0x06) #define _UFFDIO_API (0x3F) /* userfaultfd ioctl ids */ @@ -68,6 +71,8 @@ struct uffdio_copy) #define UFFDIO_ZEROPAGE _IOWR(UFFDIO, _UFFDIO_ZEROPAGE, \ struct uffdio_zeropage) +#define UFFDIO_WRITEPROTECT _IOWR(UFFDIO, _UFFDIO_WRITEPROTECT, \ + struct uffdio_writeprotect) /* read() structure */ struct uffd_msg { @@ -203,13 +208,14 @@ struct uffdio_copy { __u64 dst; __u64 src; __u64 len; +#define UFFDIO_COPY_MODE_DONTWAKE ((__u64)1<<0) /* - * There will be a wrprotection flag later that allows to map - * pages wrprotected on the fly. And such a flag will be - * available if the wrprotection ioctl are implemented for the - * range according to the uffdio_register.ioctls. + * UFFDIO_COPY_MODE_WP will map the page write protected on + * the fly. UFFDIO_COPY_MODE_WP is available only if the + * write protected ioctl is implemented for the range + * according to the uffdio_register.ioctls. */ -#define UFFDIO_COPY_MODE_DONTWAKE ((__u64)1<<0) +#define UFFDIO_COPY_MODE_WP ((__u64)1<<1) __u64 mode; /* @@ -231,4 +237,24 @@ struct uffdio_zeropage { __s64 zeropage; }; +struct uffdio_writeprotect { + struct uffdio_range range; +/* + * UFFDIO_WRITEPROTECT_MODE_WP: set the flag to write protect a range, + * unset the flag to undo protection of a range which was previously + * write protected. + * + * UFFDIO_WRITEPROTECT_MODE_DONTWAKE: set the flag to avoid waking up + * any wait thread after the operation succeeds. + * + * NOTE: Write protecting a region (WP=1) is unrelated to page faults, + * therefore DONTWAKE flag is meaningless with WP=1. Removing write + * protection (WP=0) in response to a page fault wakes the faulting + * task unless DONTWAKE is set. + */ +#define UFFDIO_WRITEPROTECT_MODE_WP ((__u64)1<<0) +#define UFFDIO_WRITEPROTECT_MODE_DONTWAKE ((__u64)1<<1) + __u64 mode; +}; + #endif /* _LINUX_USERFAULTFD_H */ diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h index fb10370d29..a41c452865 100644 --- a/linux-headers/linux/vfio.h +++ b/linux-headers/linux/vfio.h @@ -707,6 +707,43 @@ struct vfio_device_ioeventfd { #define VFIO_DEVICE_IOEVENTFD _IO(VFIO_TYPE, VFIO_BASE + 16) +/** + * VFIO_DEVICE_FEATURE - _IORW(VFIO_TYPE, VFIO_BASE + 17, + * struct vfio_device_feature) + * + * Get, set, or probe feature data of the device. The feature is selected + * using the FEATURE_MASK portion of the flags field. Support for a feature + * can be probed by setting both the FEATURE_MASK and PROBE bits. A probe + * may optionally include the GET and/or SET bits to determine read vs write + * access of the feature respectively. Probing a feature will return success + * if the feature is supported and all of the optionally indicated GET/SET + * methods are supported. The format of the data portion of the structure is + * specific to the given feature. The data portion is not required for + * probing. GET and SET are mutually exclusive, except for use with PROBE. + * + * Return 0 on success, -errno on failure. + */ +struct vfio_device_feature { + __u32 argsz; + __u32 flags; +#define VFIO_DEVICE_FEATURE_MASK (0xffff) /* 16-bit feature index */ +#define VFIO_DEVICE_FEATURE_GET (1 << 16) /* Get feature into data[] */ +#define VFIO_DEVICE_FEATURE_SET (1 << 17) /* Set feature from data[] */ +#define VFIO_DEVICE_FEATURE_PROBE (1 << 18) /* Probe feature support */ + __u8 data[]; +}; + +#define VFIO_DEVICE_FEATURE _IO(VFIO_TYPE, VFIO_BASE + 17) + +/* + * Provide support for setting a PCI VF Token, which is used as a shared + * secret between PF and VF drivers. This feature may only be set on a + * PCI SR-IOV PF when SR-IOV is enabled on the PF and there are no existing + * open VFs. Data provided when setting this feature is a 16-byte array + * (__u8 b[16]), representing a UUID. + */ +#define VFIO_DEVICE_FEATURE_PCI_VF_TOKEN (0) + /* -------- API for Type1 VFIO IOMMU -------- */ /** From patchwork Tue Apr 28 20:05:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 283841 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1475EC83004 for ; Tue, 28 Apr 2020 20:12:33 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D46E1208E0 for ; Tue, 28 Apr 2020 20:12:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="K71G6u8c" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D46E1208E0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:48942 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jTWal-0001dD-Qe for qemu-devel@archiver.kernel.org; Tue, 28 Apr 2020 16:12:31 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36906) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jTWXO-0007yP-Tl for qemu-devel@nongnu.org; Tue, 28 Apr 2020 16:11:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.90_1) (envelope-from ) id 1jTWTm-0003Zm-R3 for qemu-devel@nongnu.org; Tue, 28 Apr 2020 16:09:02 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:52330 helo=us-smtp-1.mimecast.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1jTWTm-0003Pm-5c for qemu-devel@nongnu.org; Tue, 28 Apr 2020 16:05:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1588104316; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s0mPNFnLhk/o5J3Jnu99bb2JupaYtYpoov8YY6lNUcg=; b=K71G6u8cGKMJ7fJvasaHz6I+RkcjOjhnehsyCo/S354cKmJfie8sYW9UwS+raqouLgsmkX q6TLs72JrTgUauIQ5pnCbvm/SAooUqO2v3jU64Drs7XK5PFS3Icp9Lbfcyo8jgvh2chhBF cIYnzUsHbxTWtZ3NmdE1gBRlXhqL+Gg= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-347-TkrycxZXNheA7pAAnx0VxQ-1; Tue, 28 Apr 2020 16:05:14 -0400 X-MC-Unique: TkrycxZXNheA7pAAnx0VxQ-1 Received: by mail-qk1-f200.google.com with SMTP id 30so24527564qkp.21 for ; Tue, 28 Apr 2020 13:05:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=BWN777/+kh7YasCdIn6E68M+19eQsvTkowcMlPruT+0=; b=QxFbAgYWqg+fwr9rKmztQF1WXiHJVbVsUcoZ3NY4zAPPVJnbkkst9ou9mO73URthfU n6UYVmfaVy6oJgXg9VSCMh8PWzwbu1MQ7BsiVHi7QOPXQPowt3QhxBSnWdoPmpSzdkJn K/oS+unyR+Ax3ILEJ01QecFuhJ6XZ5RyJSd84lh/24s7jGK5gHTtGPPg07znSrYqIRUk QG9/F4xo25DJDCZmgObxq1r8HdzciAR9gmJAS7c5E0uKkbJ1bTVLbI0naM/IpQh2HYcZ /9jlEkZjzlOLVdPEX/WAJAs0xwVeiz2+VVhsUvfbE5sMBxVqphWCaGHm2P0nfm8OE9BX 2g9w== X-Gm-Message-State: AGi0PubKxSkpdlHLHEBUOrV/uGAdy8VVf9DJSObOEwzE+p2y6wO9DWW/ yJKdogEK7Rm2O5D+2bkYDPxKBmTCMkp7TBRL/2MyWuLkZuD5yO7JKXBFh/xKvaN1MtHgJfTV21d tp3vw4sMq3zb4FNg= X-Received: by 2002:a05:620a:c89:: with SMTP id q9mr28569354qki.375.1588104313275; Tue, 28 Apr 2020 13:05:13 -0700 (PDT) X-Google-Smtp-Source: APiQypKbQf4BKJNy3WPcSWl7xotMGpfenfrB24JvpFA87T1iRguLgXT1YLoYbaF01K/4v99lbaYbXQ== X-Received: by 2002:a05:620a:c89:: with SMTP id q9mr28569336qki.375.1588104312931; Tue, 28 Apr 2020 13:05:12 -0700 (PDT) Received: from xz-x1.redhat.com ([2607:9880:19c0:32::2]) by smtp.gmail.com with ESMTPSA id k127sm14106585qkb.35.2020.04.28.13.05.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Apr 2020 13:05:12 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Subject: [PATCH RFC v2 4/9] KVM: Create the KVMSlot dirty bitmap on flag changes Date: Tue, 28 Apr 2020 16:05:04 -0400 Message-Id: <20200428200509.13150-2-peterx@redhat.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200428195707.11980-1-peterx@redhat.com> References: <20200428195707.11980-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=207.211.31.120; envelope-from=peterx@redhat.com; helo=us-smtp-1.mimecast.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/04/28 04:15:05 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 207.211.31.120 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: pbonzini@redhat.com, dgilbert@redhat.com, Peter Xu Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Previously we have two places that will create the per KVMSlot dirty bitmap: 1. When a newly created KVMSlot has dirty logging enabled, 2. When the first log_sync() happens for a memory slot. The 2nd case is lazy-init, while the 1st case is not (which is a fix of what the 2nd case missed). To do explicit initialization of dirty bitmaps, what we're missing is to create the dirty bitmap when the slot changed from not-dirty-track to dirty-track. Do that in kvm_slot_update_flags(). With that, we can safely remove the 2nd lazy-init. This change will be needed for kvm dirty ring because kvm dirty ring does not use the log_sync() interface at all. Also move all the pre-checks into kvm_slot_init_dirty_bitmap(). Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- accel/kvm/kvm-all.c | 23 +++++++++-------------- 1 file changed, 9 insertions(+), 14 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 1f1fd5316e..dc6371b8b2 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -162,6 +162,8 @@ static NotifierList kvm_irqchip_change_notifiers = #define kvm_slots_lock(kml) qemu_mutex_lock(&(kml)->slots_lock) #define kvm_slots_unlock(kml) qemu_mutex_unlock(&(kml)->slots_lock) +static void kvm_slot_init_dirty_bitmap(KVMSlot *mem); + int kvm_get_max_memslots(void) { KVMState *s = KVM_STATE(current_accel()); @@ -452,6 +454,7 @@ static int kvm_slot_update_flags(KVMMemoryListener *kml, KVMSlot *mem, return 0; } + kvm_slot_init_dirty_bitmap(mem); return kvm_set_user_memory_region(kml, mem, false); } @@ -536,8 +539,12 @@ static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section, #define ALIGN(x, y) (((x)+(y)-1) & ~((y)-1)) /* Allocate the dirty bitmap for a slot */ -static void kvm_memslot_init_dirty_bitmap(KVMSlot *mem) +static void kvm_slot_init_dirty_bitmap(KVMSlot *mem) { + if (!(mem->flags & KVM_MEM_LOG_DIRTY_PAGES) || mem->dirty_bmap) { + return; + } + /* * XXX bad kernel interface alert * For dirty bitmap, kernel allocates array of size aligned to @@ -588,11 +595,6 @@ static int kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml, goto out; } - if (!mem->dirty_bmap) { - /* Allocate on the first log_sync, once and for all */ - kvm_memslot_init_dirty_bitmap(mem); - } - d.dirty_bitmap = mem->dirty_bmap; d.slot = mem->slot | (kml->as_id << 16); if (kvm_vm_ioctl(s, KVM_GET_DIRTY_LOG, &d) == -1) { @@ -1086,14 +1088,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, mem->start_addr = start_addr; mem->ram = ram; mem->flags = kvm_mem_flags(mr); - - if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { - /* - * Reallocate the bmap; it means it doesn't disappear in - * middle of a migrate. - */ - kvm_memslot_init_dirty_bitmap(mem); - } + kvm_slot_init_dirty_bitmap(mem); err = kvm_set_user_memory_region(kml, mem, true); if (err) { fprintf(stderr, "%s: error registering slot: %s\n", __func__, From patchwork Tue Apr 28 20:05:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 283839 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C24DCC83000 for ; Tue, 28 Apr 2020 20:17:44 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8D940206D6 for ; Tue, 28 Apr 2020 20:17:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ArFq1Jsm" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8D940206D6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:49256 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jTWfn-0000Uf-IQ for qemu-devel@archiver.kernel.org; Tue, 28 Apr 2020 16:17:43 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36928) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jTWXS-0007yd-W2 for qemu-devel@nongnu.org; Tue, 28 Apr 2020 16:11:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.90_1) (envelope-from ) id 1jTWTu-00046U-Pa for qemu-devel@nongnu.org; Tue, 28 Apr 2020 16:09:06 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:53333 helo=us-smtp-1.mimecast.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1jTWTu-000435-9s for qemu-devel@nongnu.org; Tue, 28 Apr 2020 16:05:26 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1588104325; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZdTGxlavWNG4hCcyTcnyjmLabQ4Q1qfNUSi/H4pnRSU=; b=ArFq1JsmNIg2HfEzsTY6yWv8PCiA1iV+rSLSR9+pjMyxG7QfuGzIg3mUvufGRQSkbTC9Jt 5RLfmlpy3vgeXfOipXcf52veIDHT1XguGLMsLLj0pd+rw0Htckj1lQy/RN5Dz1la/J5TY1 8D584noSTMLdr/YzuaFT+5ihedvm310= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-413-iKvUXWTeOluhUKs5aj5Mog-1; Tue, 28 Apr 2020 16:05:24 -0400 X-MC-Unique: iKvUXWTeOluhUKs5aj5Mog-1 Received: by mail-qv1-f71.google.com with SMTP id m20so23890986qvy.13 for ; Tue, 28 Apr 2020 13:05:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=aPZzfB45yuZ2H25uVujO+WCBSsI+yNhUnbytesMWyqU=; b=ay8HM5a2XKSxFRzxxTCZU9SgGE4J0HhmdyJtyY6VuxIEsgqtUot5+dx4ZaUBacRA3S WTUnOvvylMeP0DjCeJISIc6vbc1gtuHOcrBB2zgdvIvWi1DRqW25PqS7fUmdS1ldUEle W6Guq/Xosj0ZoU1YBIBdDuObfh88ZicvtrN1jUGN9pAwcm9CwhVRgm0Kd+01p5dfwSYm vOaSn9mNbNdoiQBgf/FGgbbyvjzLN2rTIRJgLbHXPfEGZBmBVqKbM9Engno57+cE4MRM ThwW5FTWmoOwHD+RdQeTc2wRw4vWncYx8SQtPT7vTDDtjVF2RefpfgX2aEVACoTet7+M 7X1w== X-Gm-Message-State: AGi0PubYJAhA5nga2XvZ68jbg9ZvcxHCS8aPu1rJdjOoIi/0i3vq0rFV NglSzj0/O8gdsoFGkx1/+58dxq+jKrb1ak5CQ/9I9shgoFwX4GjYi61X65Cx/xDbxxwAXBGr3kt l0nI/dZF/ls5c4VY= X-Received: by 2002:ad4:5604:: with SMTP id ca4mr29879250qvb.6.1588104322831; Tue, 28 Apr 2020 13:05:22 -0700 (PDT) X-Google-Smtp-Source: APiQypJHzOphQcvX/amVj62JHtt1NmgnzAMWk3vfZ1HVeSWwr6SfP3caBgUAvCt8wb2rMX+jMbHADg== X-Received: by 2002:ad4:5604:: with SMTP id ca4mr29879232qvb.6.1588104322573; Tue, 28 Apr 2020 13:05:22 -0700 (PDT) Received: from xz-x1.redhat.com ([2607:9880:19c0:32::2]) by smtp.gmail.com with ESMTPSA id k127sm14106585qkb.35.2020.04.28.13.05.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Apr 2020 13:05:15 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Subject: [PATCH RFC v2 6/9] KVM: Provide helper to sync dirty bitmap from slot to ramblock Date: Tue, 28 Apr 2020 16:05:06 -0400 Message-Id: <20200428200509.13150-4-peterx@redhat.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200428195707.11980-1-peterx@redhat.com> References: <20200428195707.11980-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=207.211.31.120; envelope-from=peterx@redhat.com; helo=us-smtp-1.mimecast.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/04/28 04:15:05 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 207.211.31.120 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: pbonzini@redhat.com, dgilbert@redhat.com, Peter Xu Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" kvm_physical_sync_dirty_bitmap() calculates the ramblock offset in an awkward way from the MemoryRegionSection that passed in from the caller. The truth is for each KVMSlot the ramblock offset never change for the lifecycle. Cache the ramblock offset for each KVMSlot into the structure when the KVMSlot is created. With that, we can further simplify kvm_physical_sync_dirty_bitmap() with a helper to sync KVMSlot dirty bitmap to the ramblock dirty bitmap of a specific KVMSlot. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- accel/kvm/kvm-all.c | 37 +++++++++++++++++-------------------- include/sysemu/kvm_int.h | 2 ++ 2 files changed, 19 insertions(+), 20 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 436b8fd899..dd21b86efa 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -525,15 +525,12 @@ static void kvm_log_stop(MemoryListener *listener, } /* get kvm's dirty pages bitmap and update qemu's */ -static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section, - unsigned long *bitmap) +static void kvm_slot_sync_dirty_pages(KVMSlot *slot) { - ram_addr_t start = section->offset_within_region + - memory_region_get_ram_addr(section->mr); - ram_addr_t pages = int128_get64(section->size) / qemu_real_host_page_size; + ram_addr_t start = slot->ram_start_offset; + ram_addr_t pages = slot->memory_size / qemu_real_host_page_size; - cpu_physical_memory_set_dirty_lebitmap(bitmap, start, pages); - return 0; + cpu_physical_memory_set_dirty_lebitmap(slot->dirty_bmap, start, pages); } #define ALIGN(x, y) (((x)+(y)-1) & ~((y)-1)) @@ -595,12 +592,10 @@ static void kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml, KVMState *s = kvm_state; KVMSlot *mem; hwaddr start_addr, size; - hwaddr slot_size, slot_offset = 0; + hwaddr slot_size; size = kvm_align_section(section, &start_addr); while (size) { - MemoryRegionSection subsection = *section; - slot_size = MIN(kvm_max_slot_size, size); mem = kvm_lookup_matching_slot(kml, start_addr, slot_size); if (!mem) { @@ -609,12 +604,7 @@ static void kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml, } kvm_slot_get_dirty_log(s, mem); - - subsection.offset_within_region += slot_offset; - subsection.size = int128_make64(slot_size); - kvm_get_dirty_pages_log_range(&subsection, mem->dirty_bmap); - - slot_offset += slot_size; + kvm_slot_sync_dirty_pages(mem); start_addr += slot_size; size -= slot_size; } @@ -1036,7 +1026,8 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, int err; MemoryRegion *mr = section->mr; bool writeable = !mr->readonly && !mr->rom_device; - hwaddr start_addr, size, slot_size; + hwaddr start_addr, size, slot_size, mr_offset; + ram_addr_t ram_start_offset; void *ram; if (!memory_region_is_ram(mr)) { @@ -1054,9 +1045,13 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, return; } - /* use aligned delta to align the ram address */ - ram = memory_region_get_ram_ptr(mr) + section->offset_within_region + - (start_addr - section->offset_within_address_space); + /* The offset of the kvmslot within the memory region */ + mr_offset = section->offset_within_region + start_addr - + section->offset_within_address_space; + + /* use aligned delta to align the ram address and offset */ + ram = memory_region_get_ram_ptr(mr) + mr_offset; + ram_start_offset = memory_region_get_ram_addr(mr) + mr_offset; kvm_slots_lock(kml); @@ -1092,6 +1087,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, mem->as_id = kml->as_id; mem->memory_size = slot_size; mem->start_addr = start_addr; + mem->ram_start_offset = ram_start_offset; mem->ram = ram; mem->flags = kvm_mem_flags(mr); kvm_slot_init_dirty_bitmap(mem); @@ -1102,6 +1098,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, abort(); } start_addr += slot_size; + ram_start_offset += slot_size; ram += slot_size; size -= slot_size; } while (size); diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h index 4434e15ec7..1a19bfef80 100644 --- a/include/sysemu/kvm_int.h +++ b/include/sysemu/kvm_int.h @@ -25,6 +25,8 @@ typedef struct KVMSlot unsigned long *dirty_bmap; /* Cache of the address space ID */ int as_id; + /* Cache of the offset in ram address space */ + ram_addr_t ram_start_offset; } KVMSlot; typedef struct KVMMemoryListener { From patchwork Tue Apr 28 20:05:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 283840 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11F54C83000 for ; Tue, 28 Apr 2020 20:14:54 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BC3E0206D6 for ; Tue, 28 Apr 2020 20:14:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="N2QzmBcZ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BC3E0206D6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:49078 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jTWd2-0004pw-RI for qemu-devel@archiver.kernel.org; Tue, 28 Apr 2020 16:14:52 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36944) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jTWXW-0007yu-MM for qemu-devel@nongnu.org; Tue, 28 Apr 2020 16:11:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.90_1) (envelope-from ) id 1jTWU0-0004Em-6e for qemu-devel@nongnu.org; Tue, 28 Apr 2020 16:09:10 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:57413 helo=us-smtp-1.mimecast.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1jTWTz-0004CQ-NE for qemu-devel@nongnu.org; Tue, 28 Apr 2020 16:05:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1588104330; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V7cZKxuteEXSYDoI49xeS5MTvgBO9o2UUzrB1670/T4=; b=N2QzmBcZ5JhQLTlbWlrzzf9FYnibzsUNOfN+4foXj53hKiOknAVo5BFG/Nm6HGEEAFP+uF ZsBW4DrjgKfJ85t6QaVLj+zAAacUO6r3dcfgTBSA/7TEC9BOjAzJNwa2g7sah+68jTUgXu vmWhnOJ3aPNSVhfPXwxxJvE7orUlx+E= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-62-1gZVTwsTP56R6_PNZJ-7Dw-1; Tue, 28 Apr 2020 16:05:29 -0400 X-MC-Unique: 1gZVTwsTP56R6_PNZJ-7Dw-1 Received: by mail-qt1-f200.google.com with SMTP id v18so25812313qtq.22 for ; Tue, 28 Apr 2020 13:05:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=fS4jWDgrSptjjyOMhj/BjINMgbDQWDJYgx5SYZkfOPA=; b=MYKGa5P8SenzLt9x94OT/mNhsgPYgjGMyfofnPBjfRzU7iGHKmOIOkc7psK9nI96n9 jhmFBe9yrWAN2YjBjJrM/Dn2oLIcWgX6inw7go6bQx3/tnFn7TdRodqn95SqxgQSUw4B YPWOmoDSakmtizVv3tMoRsmnuqd4YUmTgkWrIAOy8JJPwsso/27cUCbYirgsl0o89D1q UknJlCJGM7ZHFWCtqo5IxONBXz1ovCQyKIoapv49oWL8Y+PdNpTAW9ZIs99vCdh5jGxE acO9EF5jn8iphLknrTqFh5zzT1HUq8/vFJC7ZnVWopuh1eVu2EZS/ewxzy+dJGMrtSUY WnRA== X-Gm-Message-State: AGi0PubJKTHGIjjnD4g1j2WMhGGK9bICAZVUmpriVXI6+m4ZNGKXsJyj ryUV3eGn8BBs/u4eBAs5og3IuUoAVwz/B+L2gKBtXr2yLMlI138sUFJHDslyK4OqoUW3aMXrE64 QvLXf5vHnaGb0Ois= X-Received: by 2002:a37:a0d5:: with SMTP id j204mr28938604qke.112.1588104327809; Tue, 28 Apr 2020 13:05:27 -0700 (PDT) X-Google-Smtp-Source: APiQypJdkuV+Q1GelkdNKVrlzRd60hPhhQMlKsEH1BXxVtp545bSQiMeHkYXMVueq61LcbAks0bgWg== X-Received: by 2002:a37:a0d5:: with SMTP id j204mr28938561qke.112.1588104327211; Tue, 28 Apr 2020 13:05:27 -0700 (PDT) Received: from xz-x1.redhat.com ([2607:9880:19c0:32::2]) by smtp.gmail.com with ESMTPSA id k127sm14106585qkb.35.2020.04.28.13.05.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Apr 2020 13:05:26 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Subject: [PATCH RFC v2 9/9] KVM: Dirty ring support Date: Tue, 28 Apr 2020 16:05:09 -0400 Message-Id: <20200428200509.13150-7-peterx@redhat.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200428195707.11980-1-peterx@redhat.com> References: <20200428195707.11980-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=205.139.110.120; envelope-from=peterx@redhat.com; helo=us-smtp-1.mimecast.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/04/28 02:16:38 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] X-Received-From: 205.139.110.120 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: pbonzini@redhat.com, dgilbert@redhat.com, Peter Xu Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" KVM dirty ring is a new interface to pass over dirty bits from kernel to the userspace. Instead of using a bitmap for each memory region, the dirty ring contains an array of dirtied GPAs to fetch (in the form of offset in slots). For each vcpu there will be one dirty ring that binds to it. kvm_dirty_ring_reap() is the major function to collect dirty rings. It can be called either by a standalone reaper thread that runs in the background, collecting dirty pages for the whole VM. It can also be called directly by any thread that has BQL taken. Signed-off-by: Peter Xu --- accel/kvm/kvm-all.c | 341 ++++++++++++++++++++++++++++++++++++++++- accel/kvm/trace-events | 7 + include/hw/core/cpu.h | 8 + 3 files changed, 353 insertions(+), 3 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index fbb0a3b1e9..236dbcd536 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -15,6 +15,7 @@ #include "qemu/osdep.h" #include +#include #include @@ -75,6 +76,25 @@ struct KVMParkedVcpu { QLIST_ENTRY(KVMParkedVcpu) node; }; +enum KVMDirtyRingReaperState { + KVM_DIRTY_RING_REAPER_NONE = 0, + /* The reaper is sleeping */ + KVM_DIRTY_RING_REAPER_WAIT, + /* The reaper is reaping for dirty pages */ + KVM_DIRTY_RING_REAPER_REAPING, +}; + +/* + * KVM reaper instance, responsible for collecting the KVM dirty bits + * via the dirty ring. + */ +struct KVMDirtyRingReaper { + /* The reaper thread */ + QemuThread reaper_thr; + volatile uint64_t reaper_iteration; /* iteration number of reaper thr */ + volatile enum KVMDirtyRingReaperState reaper_state; /* reap thr state */ +}; + struct KVMState { AccelState parent_obj; @@ -121,7 +141,6 @@ struct KVMState void *memcrypt_handle; int (*memcrypt_encrypt_data)(void *handle, uint8_t *ptr, uint64_t len); - /* For "info mtree -f" to tell if an MR is registered in KVM */ int nr_as; struct KVMAs { KVMMemoryListener *ml; @@ -130,6 +149,7 @@ struct KVMState bool kvm_dirty_ring_enabled; /* Whether KVM dirty ring is enabled */ uint64_t kvm_dirty_ring_size; /* Size of the per-vcpu dirty ring */ uint32_t kvm_dirty_gfn_count; /* Number of dirty GFNs per ring */ + struct KVMDirtyRingReaper reaper; }; KVMState *kvm_state; @@ -359,6 +379,13 @@ int kvm_destroy_vcpu(CPUState *cpu) goto err; } + if (cpu->kvm_dirty_gfns) { + ret = munmap(cpu->kvm_dirty_gfns, s->kvm_dirty_ring_size); + if (ret < 0) { + goto err; + } + } + vcpu = g_malloc0(sizeof(*vcpu)); vcpu->vcpu_id = kvm_arch_vcpu_id(cpu); vcpu->kvm_fd = cpu->kvm_fd; @@ -423,6 +450,19 @@ int kvm_init_vcpu(CPUState *cpu) (void *)cpu->kvm_run + s->coalesced_mmio * PAGE_SIZE; } + if (s->kvm_dirty_ring_enabled) { + /* Use MAP_SHARED to share pages with the kernel */ + cpu->kvm_dirty_gfns = mmap(NULL, s->kvm_dirty_ring_size, + PROT_READ | PROT_WRITE, MAP_SHARED, + cpu->kvm_fd, + PAGE_SIZE * KVM_DIRTY_LOG_PAGE_OFFSET); + if (cpu->kvm_dirty_gfns == MAP_FAILED) { + ret = -errno; + DPRINTF("mmap'ing vcpu dirty gfns failed: %d\n", ret); + goto err; + } + } + ret = kvm_arch_init_vcpu(cpu); err: return ret; @@ -536,6 +576,11 @@ static void kvm_slot_sync_dirty_pages(KVMSlot *slot) cpu_physical_memory_set_dirty_lebitmap(slot->dirty_bmap, start, pages); } +static void kvm_slot_reset_dirty_pages(KVMSlot *slot) +{ + memset(slot->dirty_bmap, 0, slot->dirty_bmap_size); +} + #define ALIGN(x, y) (((x)+(y)-1) & ~((y)-1)) /* Allocate the dirty bitmap for a slot */ @@ -579,6 +624,198 @@ static void kvm_slot_get_dirty_log(KVMState *s, KVMSlot *slot) } } +/* Should be with all slots_lock held for the address spaces. */ +static void kvm_dirty_ring_mark_page(KVMState *s, uint32_t as_id, + uint32_t slot_id, uint64_t offset) +{ + KVMMemoryListener *kml; + KVMSlot *mem; + + if (as_id >= s->nr_as) { + return; + } + + kml = s->as[as_id].ml; + mem = &kml->slots[slot_id]; + + if (!mem->memory_size || offset >= (mem->memory_size / TARGET_PAGE_SIZE)) { + return; + } + + set_bit(offset, mem->dirty_bmap); +} + +static bool dirty_gfn_is_dirtied(struct kvm_dirty_gfn *gfn) +{ + return gfn->flags == KVM_DIRTY_GFN_F_DIRTY; +} + +static void dirty_gfn_set_collected(struct kvm_dirty_gfn *gfn) +{ + gfn->flags = KVM_DIRTY_GFN_F_RESET; +} + +/* + * Should be with all slots_lock held for the address spaces. It returns the + * dirty page we've collected on this dirty ring. + */ +static uint32_t kvm_dirty_ring_reap_one(KVMState *s, CPUState *cpu) +{ + struct kvm_dirty_gfn *dirty_gfns = cpu->kvm_dirty_gfns, *cur; + uint32_t gfn_count = s->kvm_dirty_gfn_count; + uint32_t count = 0, fetch = cpu->kvm_fetch_index; + + assert(dirty_gfns && gfn_count); + trace_kvm_dirty_ring_reap_vcpu(cpu->cpu_index); + + while (true) { + cur = &dirty_gfns[fetch % gfn_count]; + if (!dirty_gfn_is_dirtied(cur)) { + break; + } + kvm_dirty_ring_mark_page(s, cur->slot >> 16, cur->slot & 0xffff, + cur->offset); + dirty_gfn_set_collected(cur); + trace_kvm_dirty_ring_page(cpu->cpu_index, fetch, cur->offset); + fetch++; + count++; + } + cpu->kvm_fetch_index = fetch; + + return count; +} + +/* + * Currently for simplicity, we must hold BQL before calling this. We can + * consider to drop the BQL if we're clear with all the race conditions. + */ +static uint64_t kvm_dirty_ring_reap(KVMState *s) +{ + KVMMemoryListener *kml; + int ret, i, locked_count = s->nr_as; + CPUState *cpu; + uint64_t total = 0; + int64_t stamp; + + /* + * We need to lock all kvm slots for all address spaces here, + * because: + * + * (1) We need to mark dirty for dirty bitmaps in multiple slots + * and for tons of pages, so it's better to take the lock here + * once rather than once per page. And more importantly, + * + * (2) We must _NOT_ publish dirty bits to the other threads + * (e.g., the migration thread) via the kvm memory slot dirty + * bitmaps before correctly re-protect those dirtied pages. + * Otherwise we can have potential risk of data corruption if + * the page data is read in the other thread before we do + * reset below. + */ + for (i = 0; i < s->nr_as; i++) { + kml = s->as[i].ml; + if (!kml) { + /* + * This is tricky - we grow s->as[] dynamically now. Take + * care of that case. We also assumed the as[] will fill + * one by one starting from zero. Without this, we race + * with register_smram_listener. + * + * TODO: make all these prettier... + */ + locked_count = i; + break; + } + kvm_slots_lock(kml); + } + + stamp = get_clock(); + + CPU_FOREACH(cpu) { + total += kvm_dirty_ring_reap_one(s, cpu); + } + + if (total) { + ret = kvm_vm_ioctl(s, KVM_RESET_DIRTY_RINGS); + assert(ret == total); + } + + stamp = get_clock() - stamp; + + if (total) { + trace_kvm_dirty_ring_reap(total, stamp / 1000); + } + + /* Unlock whatever locks that we have locked */ + for (i = 0; i < locked_count; i++) { + kvm_slots_unlock(s->as[i].ml); + } + + return total; +} + +static void do_kvm_cpu_synchronize_kick(CPUState *cpu, run_on_cpu_data arg) +{ + /* No need to do anything */ +} + +/* + * Kick all vcpus out in a synchronized way. When returned, we + * guarantee that every vcpu has been kicked and at least returned to + * userspace once. + */ +static void kvm_cpu_synchronize_kick_all(void) +{ + CPUState *cpu; + + CPU_FOREACH(cpu) { + run_on_cpu(cpu, do_kvm_cpu_synchronize_kick, RUN_ON_CPU_NULL); + } +} + +/* + * Flush all the existing dirty pages to the KVM slot buffers. When + * this call returns, we guarantee that all the touched dirty pages + * before calling this function have been put into the per-kvmslot + * dirty bitmap. + * + * To achieve this, we need to: + * + * (1) Kick all vcpus out, this will make sure that we flush all the + * dirty buffers that potentially in the hardware (PML) into the + * dirty rings, after that, + * + * (2) Kick the reaper thread and make sure it reaps all the dirty + * page that is in the dirty rings. + * + * This function must be called with BQL held. + */ +static void kvm_dirty_ring_flush(struct KVMDirtyRingReaper *r) +{ + trace_kvm_dirty_ring_flush(0); + + /* + * The function needs to be serialized. Since this function + * should always be with BQL held, serialization is guaranteed. + * However, let's be sure of it. + */ + assert(qemu_mutex_iothread_locked()); + + /* + * First make sure to flush the hardware buffers by kicking all + * vcpus out in a synchronous way. + */ + kvm_cpu_synchronize_kick_all(); + + /* + * Recycle the dirty bits outside the reaper thread. We're safe because + * kvm_dirty_ring_reap() is internally protected by a mutex. + */ + kvm_dirty_ring_reap(kvm_state); + + trace_kvm_dirty_ring_flush(1); +} + /** * kvm_physical_sync_dirty_bitmap - Sync dirty bitmap from kernel space * @@ -1111,6 +1348,51 @@ out: kvm_slots_unlock(kml); } +static void *kvm_dirty_ring_reaper_thread(void *data) +{ + KVMState *s = data; + struct KVMDirtyRingReaper *r = &s->reaper; + + rcu_register_thread(); + + trace_kvm_dirty_ring_reaper("init"); + + while (true) { + r->reaper_state = KVM_DIRTY_RING_REAPER_WAIT; + trace_kvm_dirty_ring_reaper("wait"); + /* + * TODO: provide a smarter timeout rather than a constant? + */ + sleep(1); + + trace_kvm_dirty_ring_reaper("wakeup"); + r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING; + + qemu_mutex_lock_iothread(); + kvm_dirty_ring_reap(s); + qemu_mutex_unlock_iothread(); + + r->reaper_iteration++; + } + + trace_kvm_dirty_ring_reaper("exit"); + + rcu_unregister_thread(); + + return NULL; +} + +static int kvm_dirty_ring_reaper_init(KVMState *s) +{ + struct KVMDirtyRingReaper *r = &s->reaper; + + qemu_thread_create(&r->reaper_thr, "kvm-reaper", + kvm_dirty_ring_reaper_thread, + s, QEMU_THREAD_JOINABLE); + + return 0; +} + static void kvm_region_add(MemoryListener *listener, MemoryRegionSection *section) { @@ -1139,6 +1421,36 @@ static void kvm_log_sync(MemoryListener *listener, kvm_slots_unlock(kml); } +static void kvm_log_sync_global(MemoryListener *l) +{ + KVMMemoryListener *kml = container_of(l, KVMMemoryListener, listener); + KVMState *s = kvm_state; + KVMSlot *mem; + int i; + + /* Flush all kernel dirty addresses into KVMSlot dirty bitmap */ + kvm_dirty_ring_flush(&s->reaper); + + /* + * TODO: make this faster when nr_slots is big while there are + * only a few used slots (small VMs). + */ + kvm_slots_lock(kml); + for (i = 0; i < s->nr_slots; i++) { + mem = &kml->slots[i]; + if (mem->memory_size && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { + kvm_slot_sync_dirty_pages(mem); + /* + * This is not needed by KVM_GET_DIRTY_LOG because the + * ioctl will unconditionally overwrite the whole region. + * However kvm dirty ring has no such side effect. + */ + kvm_slot_reset_dirty_pages(mem); + } + } + kvm_slots_unlock(kml); +} + static void kvm_log_clear(MemoryListener *listener, MemoryRegionSection *section) { @@ -1245,10 +1557,15 @@ void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml, kml->listener.region_del = kvm_region_del; kml->listener.log_start = kvm_log_start; kml->listener.log_stop = kvm_log_stop; - kml->listener.log_sync = kvm_log_sync; - kml->listener.log_clear = kvm_log_clear; kml->listener.priority = 10; + if (s->kvm_dirty_ring_enabled) { + kml->listener.log_sync_global = kvm_log_sync_global; + } else { + kml->listener.log_sync = kvm_log_sync; + kml->listener.log_clear = kvm_log_clear; + } + memory_listener_register(&kml->listener, as); for (i = 0; i < s->nr_as; ++i) { @@ -2138,6 +2455,13 @@ static int kvm_init(MachineState *ms) qemu_balloon_inhibit(true); } + if (s->kvm_dirty_ring_enabled) { + ret = kvm_dirty_ring_reaper_init(s); + if (ret) { + goto err; + } + } + return 0; err: @@ -2445,6 +2769,17 @@ int kvm_cpu_exec(CPUState *cpu) case KVM_EXIT_INTERNAL_ERROR: ret = kvm_handle_internal_error(cpu, run); break; + case KVM_EXIT_DIRTY_RING_FULL: + /* + * We shouldn't continue if the dirty ring of this vcpu is + * still full. Got kicked by KVM_RESET_DIRTY_RINGS. + */ + trace_kvm_dirty_ring_full(cpu->cpu_index); + qemu_mutex_lock_iothread(); + kvm_dirty_ring_reap(kvm_state); + qemu_mutex_unlock_iothread(); + ret = 0; + break; case KVM_EXIT_SYSTEM_EVENT: switch (run->system_event.type) { case KVM_SYSTEM_EVENT_SHUTDOWN: diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events index 4fb6e59d19..89ef99569f 100644 --- a/accel/kvm/trace-events +++ b/accel/kvm/trace-events @@ -16,4 +16,11 @@ kvm_set_ioeventfd_mmio(int fd, uint64_t addr, uint32_t val, bool assign, uint32_ kvm_set_ioeventfd_pio(int fd, uint16_t addr, uint32_t val, bool assign, uint32_t size, bool datamatch) "fd: %d @0x%x val=0x%x assign: %d size: %d match: %d" kvm_set_user_memory(uint32_t slot, uint32_t flags, uint64_t guest_phys_addr, uint64_t memory_size, uint64_t userspace_addr, int ret) "Slot#%d flags=0x%x gpa=0x%"PRIx64 " size=0x%"PRIx64 " ua=0x%"PRIx64 " ret=%d" kvm_clear_dirty_log(uint32_t slot, uint64_t start, uint32_t size) "slot#%"PRId32" start 0x%"PRIx64" size 0x%"PRIx32 +kvm_dirty_ring_full(int id) "vcpu %d" +kvm_dirty_ring_reap_vcpu(int id) "vcpu %d" +kvm_dirty_ring_page(int vcpu, uint32_t slot, uint64_t offset) "vcpu %d fetch %"PRIu32" offset 0x%"PRIx64 +kvm_dirty_ring_reaper(const char *s) "%s" +kvm_dirty_ring_reap(uint64_t count, int64_t t) "reaped %"PRIu64" pages (took %"PRIi64" us)" +kvm_dirty_ring_reaper_kick(const char *reason) "%s" +kvm_dirty_ring_flush(int finished) "%d" diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h index b26fdb5ab8..2d2bf61629 100644 --- a/include/hw/core/cpu.h +++ b/include/hw/core/cpu.h @@ -340,6 +340,11 @@ struct qemu_work_item; * @ignore_memory_transaction_failures: Cached copy of the MachineState * flag of the same name: allows the board to suppress calling of the * CPU do_transaction_failed hook function. + * @kvm_dirty_ring_full: + * Whether the kvm dirty ring of this vcpu is soft-full. + * @kvm_dirty_ring_avail: + * Semaphore to be posted when the kvm dirty ring of the vcpu is + * available again. * * State of one CPU core or thread. */ @@ -407,9 +412,12 @@ struct CPUState { */ uintptr_t mem_io_pc; + /* Only used in KVM */ int kvm_fd; struct KVMState *kvm_state; struct kvm_run *kvm_run; + struct kvm_dirty_gfn *kvm_dirty_gfns; + uint32_t kvm_fetch_index; /* Used for events with 'vcpu' and *without* the 'disabled' properties */ DECLARE_BITMAP(trace_dstate_delayed, CPU_TRACE_DSTATE_MAX_EVENTS);