From patchwork Mon Feb 11 17:29:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Will Deacon X-Patchwork-Id: 158017 Delivered-To: patch@linaro.org Received: by 2002:a02:48:0:0:0:0:0 with SMTP id 69csp2862378jaa; Mon, 11 Feb 2019 09:30:00 -0800 (PST) X-Google-Smtp-Source: AHgI3Ia4RtxWhCL2u1CzsE55NZXmpX/Bo1bUnnKhjjpZKKoKUKEzc8woyanDkJI3Y7eBdDQXM9MR X-Received: by 2002:a62:3603:: with SMTP id d3mr38840584pfa.146.1549906200660; Mon, 11 Feb 2019 09:30:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549906200; cv=none; d=google.com; s=arc-20160816; b=d9X9yBICp2DvkGmQv19nnB6nQQcHH9n7429XV/U6XdnlY9B999cUwge1Sr64binbhW FtW5sv5akFslCCbRCUJYtdvfrvidCJCZ5l++I2cVS7N6DRYOzRKR9l2Eyo3G08eQs4Xj Sr9GAvNokjqvflcvlG9BsvL+PelChyi9PKMNgrlpJlrmxv+dDZKK7oQxbqwj5S9413q7 ypd2scJBc54GE5ctN9mtFLGpvzb2cIzh3k0s9iq3a9CfA6UHpNPcns8OTBQ4t2xQVXrP qq+x4CIyBlpAPSCpcQAdzqh+stGH/MKH74k4TF33nuxDD04s1bVCUMbu8MKMGMxkSNJB dnyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from; bh=d3c7VuYALhqY0n2CF0tqhumT4a8Tm9aZGnx+TCJmJKU=; b=aLEfb0fXLCMS2NySvuJmEFYxrnNpJz+g8P5XXODOpfs8ZsTsoYpAfskcpe1AtHy0Kc M2NTSLZGmhgR0VQZwJWNIG3XHvI8pXKLzXYyX4sIVU/sxlWpS84C6N7ol+mjF6dQzdtU A7ZfynpGlbLPNlNP2vbNUKV9/UrJdHi1vho1OaAYf2FgGYWpXT/RKWBOLeb75p3nXTC2 0QQAKT+9FOdBNZ2rQSHz2x1u6MgmFV3adVbfRf5c8JscMQDM7J48pfrRyQnoFDzwZIKr S180j3D2MdfjPINHWyL6ZAZ7pg1FYQfAXEG6UfFdsDdl9hq5ZGmLxA5PzTxN0+Xw3dSl rzeg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o11si5618382pgs.126.2019.02.11.09.30.00; Mon, 11 Feb 2019 09:30:00 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731057AbfBKR36 (ORCPT + 31 others); Mon, 11 Feb 2019 12:29:58 -0500 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:54556 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730791AbfBKR34 (ORCPT ); Mon, 11 Feb 2019 12:29:56 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E03D080D; Mon, 11 Feb 2019 09:29:55 -0800 (PST) Received: from fuggles.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id CC8F83F675; Mon, 11 Feb 2019 09:29:53 -0800 (PST) From: Will Deacon To: linux-arch@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Will Deacon , "Paul E. McKenney" , Benjamin Herrenschmidt , Arnd Bergmann , Peter Zijlstra , Andrea Parri , Daniel Lustig , David Howells , Alan Stern , Linus Torvalds Subject: [RFC PATCH] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section Date: Mon, 11 Feb 2019 17:29:48 +0000 Message-Id: <20190211172948.3322-1-will.deacon@arm.com> X-Mailer: git-send-email 2.11.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The "KERNEL I/O BARRIER EFFECTS" section of memory-barriers.txt is vague, x86-centric, out-of-date, incomplete and demonstrably incorrect in places. This is largely because I/O ordering is a horrible can of worms, but also because the document has stagnated as our understanding has evolved. Attempt to address some of that, by rewriting the section based on recent(-ish) discussions with Arnd, BenH and others. Maybe one day we'll find a way to formalise this stuff, but for now let's at least try to make the English easier to understand. Cc: "Paul E. McKenney" Cc: Benjamin Herrenschmidt Cc: Arnd Bergmann Cc: Peter Zijlstra Cc: Andrea Parri Cc: Daniel Lustig Cc: David Howells Cc: Alan Stern cc: Linus Torvalds Signed-off-by: Will Deacon --- Documentation/memory-barriers.txt | 115 ++++++++++++++++++++------------------ 1 file changed, 62 insertions(+), 53 deletions(-) -- 2.11.0 diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 1c22b21ae922..d08b49b2c011 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -2599,72 +2599,81 @@ likely, then interrupt-disabling locks should be used to guarantee ordering. KERNEL I/O BARRIER EFFECTS ========================== -When accessing I/O memory, drivers should use the appropriate accessor -functions: - - (*) inX(), outX(): - - These are intended to talk to I/O space rather than memory space, but - that's primarily a CPU-specific concept. The i386 and x86_64 processors - do indeed have special I/O space access cycles and instructions, but many - CPUs don't have such a concept. - - The PCI bus, amongst others, defines an I/O space concept which - on such - CPUs as i386 and x86_64 - readily maps to the CPU's concept of I/O - space. However, it may also be mapped as a virtual I/O space in the CPU's - memory map, particularly on those CPUs that don't support alternate I/O - spaces. - - Accesses to this space may be fully synchronous (as on i386), but - intermediary bridges (such as the PCI host bridge) may not fully honour - that. - - They are guaranteed to be fully ordered with respect to each other. - - They are not guaranteed to be fully ordered with respect to other types of - memory and I/O operation. +Interfacing with peripherals via I/O accesses is deeply architecture and device +specific. Therefore, drivers which are inherently non-portable may rely on +specific behaviours of their target systems in order to achieve synchronization +in the most lightweight manner possible. For drivers intending to be portable +between multiple architectures and bus implementations, the kernel offers a +series of accessor functions that provide various degrees of ordering +guarantees: (*) readX(), writeX(): - Whether these are guaranteed to be fully ordered and uncombined with - respect to each other on the issuing CPU depends on the characteristics - defined for the memory window through which they're accessing. On later - i386 architecture machines, for example, this is controlled by way of the - MTRR registers. + The readX() and writeX() MMIO accessors take a pointer to the peripheral + being accessed as an __iomem * parameter. For pointers mapped with the + default I/O attributes (e.g. those returned by ioremap()), then the + ordering guarantees are as follows: + + 1. All readX() and writeX() accesses to the same peripheral are ordered + with respect to each other. For example, this ensures that MMIO register + writes by the CPU to a particular device will arrive in program order. + + 2. A writeX() by the CPU to the peripheral will first wait for the + completion of all prior CPU writes to memory. For example, this ensures + that writes by the CPU to an outbound DMA buffer allocated by + dma_alloc_coherent() will be visible to a DMA engine when the CPU writes + to its MMIO control register to trigger the transfer. + + 3. A readX() by the CPU from the peripheral will complete before any + subsequent CPU reads from memory can begin. For example, this ensures + that reads by the CPU from an incoming DMA buffer allocated by + dma_alloc_coherent() will not see stale data after reading from the DMA + engine's MMIO status register to establish that the DMA transfer has + completed. + + 4. A readX() by the CPU from the peripheral will complete before any + subsequent delay() loop can begin execution. For example, this ensures + that two MMIO register writes by the CPU to a peripheral will arrive at + least 1us apart if the first write is immediately read back with readX() + and udelay(1) is called prior to the second writeX(). + + __iomem pointers obtained with non-default attributes (e.g. those returned + by ioremap_wc()) are unlikely to provide many of these guarantees. If + ordering is required for such mappings, then the mandatory barriers should + be used in conjunction with the _relaxed() accessors defined below. + + (*) readX_relaxed(), writeX_relaxed(): - Ordinarily, these will be guaranteed to be fully ordered and uncombined, - provided they're not accessing a prefetchable device. - - However, intermediary hardware (such as a PCI bridge) may indulge in - deferral if it so wishes; to flush a store, a load from the same location - is preferred[*], but a load from the same device or from configuration - space should suffice for PCI. - - [*] NOTE! attempting to load from the same location as was written to may - cause a malfunction - consider the 16550 Rx/Tx serial registers for - example. - - Used with prefetchable I/O memory, an mmiowb() barrier may be required to - force stores to be ordered. + These are similar to readX() and writeX(), but provide weaker memory + ordering guarantees. Specifically, they do not guarantee ordering with + respect to normal memory accesses or delay() loops (i.e bullets 2-4 above) + but they are still guaranteed to be ordered with respect to other accesses + to the same peripheral when operating on __iomem pointers mapped with the + default I/O attributes. - Please refer to the PCI specification for more information on interactions - between PCI transactions. + (*) inX(), outX(): - (*) readX_relaxed(), writeX_relaxed() + The inX() and outX() accessors are intended to access legacy port-mapped + I/O peripherals, which may require special instructions on some + architectures (notably x86). The port number of the peripheral being + accessed is passed as an argument. - These are similar to readX() and writeX(), but provide weaker memory - ordering guarantees. Specifically, they do not guarantee ordering with - respect to normal memory accesses (e.g. DMA buffers) nor do they guarantee - ordering with respect to LOCK or UNLOCK operations. If the latter is - required, an mmiowb() barrier can be used. Note that relaxed accesses to - the same peripheral are guaranteed to be ordered with respect to each - other. + Since many CPU architectures ultimately access these peripherals via an + internal virtual memory mapping, the portable ordering guarantees provided + by inX() and outX() are the same as those provided by readX() and writeX() + respectively when accessing a mapping with the default I/O attributes. (*) ioreadX(), iowriteX() These will perform appropriately for the type of access they're actually doing, be it inX()/outX() or readX()/writeX(). +All of these accessors assume that the underlying peripheral is little-endian, +and will therefore perform byte-swapping operations on big-endian architectures. + +Composing I/O ordering barriers with SMP ordering barriers and LOCK/UNLOCK +operations is a dangerous sport which may require the use of mmiowb(). See the +subsection "Acquires vs I/O accesses" for more information. ======================================== ASSUMED MINIMUM EXECUTION ORDERING MODEL