From patchwork Tue Mar 26 23:41:16 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Paul E. McKenney" <paulmck@linux.ibm.com>
X-Patchwork-Id: 161238
Delivered-To: patch@linaro.org
Received: by 2002:a02:c6d8:0:0:0:0:0 with SMTP id r24csp5815253jan;
 Tue, 26 Mar 2019 16:42:00 -0700 (PDT)
X-Google-Smtp-Source: APXvYqzS1ZHqJ984/HqXPQ4zBa7Ay08XC2iK36A7syov9xdNtakeXG4bqnBEsBS+sVKa1Te9KUfe
X-Received: by 2002:a63:9246:: with SMTP id s6mr31788879pgn.316.1553643720899; 
 Tue, 26 Mar 2019 16:42:00 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1553643720; cv=none;
 d=google.com; s=arc-20160816;
 b=Htv+VUqfNzhGZYbaiFdS98L2f5g1pIaMJkTmqaTTHCaE1Y8TSPZt2ApN2n4x2CqBYW
 6wJ3ybmJIV3ZYmVGjsPrk0ZrT+ebOGNqaX9oIFQDGLDT/cGXFpIbcu1+bfSlN313Wgwq
 fnGte6KYCnGUTyKdhSIYrIHGGeOkwfusoVZ/9ZCv2YwNXaqyNl3udTCyyIWKaN2IqFVg
 W8yQ2QKVKAjVA/Ike0XXqSMgh88NSKtmeqtT5mMIjBZbtSejh/zosJgVWEkktN9DOi6O
 K2KcegSsWCRIZblFFp9liVR98qVmzYOVfrjn6Geo3q4ces+6dHp0pezogGBCnK3ZjYRQ
 9tkg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:message-id:references:in-reply-to:date
 :subject:cc:to:from;
 bh=KdC4N4tZ51MLgmzGggbA3EXWwhWulGGsBLKEgdztQ4A=;
 b=xdLXc4pFu1GeAW3IjsT4kx691aJ1yQnmXhf03PNiv1qjPoLoKYL/R+ILIOPV8gAbQc
 Jfrt9x2CS7oZM6T/QONEHmZkE59tFxMjoKPqYvuDy9U6hZXl+U9VG8TfbuiOSdNFn+on
 N0GNQ3ZOThbmEfiIkV1OyWF3Wub+8zpz1q64khSYnkJki5dWw2Fgk+SZgtoLWl9aYvP3
 bNmUrms6rPmrv+rT++S84zgQtSPewTjHrdacpLyPbwWKoanNE5gFveMqyrXiUbpg8k4v
 ye3nATIaZAGVN457V6wHpQ6bztnRnlA5OCoVq9Xh8FyWGN2LOhSToDihHVnlWlH1XHk7
 aitg==
ARC-Authentication-Results: i=1; mx.google.com;
 spf=pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org; 
 dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 t13si16748892pgp.271.2019.03.26.16.42.00; 
 Tue, 26 Mar 2019 16:42:00 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 spf=pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org; 
 dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1733141AbfCZXl7 (ORCPT <rfc822;mike.holmes@linaro.org>
 + 31 others); Tue, 26 Mar 2019 19:41:59 -0400
Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:40880 "EHLO
 mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK)
 by vger.kernel.org with ESMTP id S1732965AbfCZXlo (ORCPT
 <rfc822;linux-kernel@vger.kernel.org>);
 Tue, 26 Mar 2019 19:41:44 -0400
Received: from pps.filterd (m0098410.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id
 x2QNXess135577
 for <linux-kernel@vger.kernel.org>; Tue, 26 Mar 2019 19:41:43 -0400
Received: from e12.ny.us.ibm.com (e12.ny.us.ibm.com [129.33.205.202])
 by mx0a-001b2d01.pphosted.com with ESMTP id 2rft2r11a7-1
 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT)
 for <linux-kernel@vger.kernel.org>; Tue, 26 Mar 2019 19:41:43 -0400
Received: from localhost
 by e12.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use
 Only! Violators will be prosecuted
 for <linux-kernel@vger.kernel.org> from <paulmck@linux.vnet.ibm.com>; 
 Tue, 26 Mar 2019 23:41:42 -0000
Received: from b01cxnp23032.gho.pok.ibm.com (9.57.198.27)
 by e12.ny.us.ibm.com (146.89.104.199) with IBM ESMTP SMTP Gateway:
 Authorized Use Only! Violators will be prosecuted; 
 (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256)
 Tue, 26 Mar 2019 23:41:35 -0000
Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com
 [9.57.199.108])
 by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP
 id x2QNfYHS10485848
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256
 verify=OK); Tue, 26 Mar 2019 23:41:34 GMT
Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 87E9CB206B;
 Tue, 26 Mar 2019 23:41:34 +0000 (GMT)
Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 55671B2066;
 Tue, 26 Mar 2019 23:41:34 +0000 (GMT)
Received: from paulmck-ThinkPad-W541 (unknown [9.70.82.188])
 by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP;
 Tue, 26 Mar 2019 23:41:34 +0000 (GMT)
Received: by paulmck-ThinkPad-W541 (Postfix, from userid 1000)
 id 78A7F16C6081; Tue, 26 Mar 2019 16:41:35 -0700 (PDT)
From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, mingo@kernel.org
Cc: stern@rowland.harvard.edu, andrea.parri@amarulasolutions.com,
 will.deacon@arm.com, peterz@infradead.org, boqun.feng@gmail.com,
 npiggin@gmail.com, dhowells@redhat.com, j.alglave@ucl.ac.uk,
 luc.maranget@inria.fr, akiyks@gmail.com,
 "Paul E. McKenney" <paulmck@linux.ibm.com>,
 Benjamin Herrenschmidt <benh@kernel.crashing.org>,
 Michael Ellerman <mpe@ellerman.id.au>, Arnd Bergmann <arnd@arndb.de>,
 Palmer Dabbelt <palmer@sifive.com>, Daniel Lustig <dlustig@nvidia.com>,
 Linus Torvalds <torvalds@linux-foundation.org>,
 "Maciej W. Rozycki" <macro@linux-mips.org>,
 Mikulas Patocka <mpatocka@redhat.com>
Subject: [PATCH tip/core/rcu 04/21] docs/memory-barriers.txt: Rewrite
 "KERNEL I/O BARRIER EFFECTS" section
Date: Tue, 26 Mar 2019 16:41:16 -0700
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20190326234114.GA23843@linux.ibm.com>
References: <20190326234114.GA23843@linux.ibm.com>
X-TM-AS-GCONF: 00
x-cbid: 19032623-0060-0000-0000-00000322757A
X-IBM-SpamModules-Scores: 
X-IBM-SpamModules-Versions: BY=3.00010820; HX=3.00000242; KW=3.00000007;
 PH=3.00000004; SC=3.00000282; SDB=6.01180142; UDB=6.00617577;
 IPR=6.00960858; 
 MB=3.00026170; MTD=3.00000008; XFM=3.00000015; UTC=2019-03-26 23:41:40
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 19032623-0061-0000-0000-000048BCDA66
Message-Id: <20190326234133.24962-4-paulmck@linux.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, ,
 definitions=2019-03-26_15:, , signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 priorityscore=1501
 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0
 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0
 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx
 scancount=1 engine=8.0.1-1810050000 definitions=main-1903260159
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

From: Will Deacon <will.deacon@arm.com>

The "KERNEL I/O BARRIER EFFECTS" section of memory-barriers.txt is vague,
x86-centric, out-of-date, incomplete and demonstrably incorrect in places.
This is largely because I/O ordering is a horrible can of worms, but also
because the document has stagnated as our understanding has evolved.

Attempt to address some of that, by rewriting the section based on
recent(-ish) discussions with Arnd, BenH and others. Maybe one day we'll
find a way to formalise this stuff, but for now let's at least try to
make the English easier to understand.

Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Daniel Lustig <dlustig@nvidia.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
---
 Documentation/memory-barriers.txt | 115 ++++++++++++++++++------------
 1 file changed, 70 insertions(+), 45 deletions(-)

-- 
2.17.1

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 1c22b21ae922..158947ae78c2 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -2599,72 +2599,97 @@ likely, then interrupt-disabling locks should be used to guarantee ordering.
 KERNEL I/O BARRIER EFFECTS
 ==========================
 
-When accessing I/O memory, drivers should use the appropriate accessor
-functions:
+Interfacing with peripherals via I/O accesses is deeply architecture and device
+specific. Therefore, drivers which are inherently non-portable may rely on
+specific behaviours of their target systems in order to achieve synchronization
+in the most lightweight manner possible. For drivers intending to be portable
+between multiple architectures and bus implementations, the kernel offers a
+series of accessor functions that provide various degrees of ordering
+guarantees:
 
- (*) inX(), outX():
+ (*) readX(), writeX():
 
-     These are intended to talk to I/O space rather than memory space, but
-     that's primarily a CPU-specific concept.  The i386 and x86_64 processors
-     do indeed have special I/O space access cycles and instructions, but many
-     CPUs don't have such a concept.
+     The readX() and writeX() MMIO accessors take a pointer to the peripheral
+     being accessed as an __iomem * parameter. For pointers mapped with the
+     default I/O attributes (e.g. those returned by ioremap()), then the
+     ordering guarantees are as follows:
 
-     The PCI bus, amongst others, defines an I/O space concept which - on such
-     CPUs as i386 and x86_64 - readily maps to the CPU's concept of I/O
-     space.  However, it may also be mapped as a virtual I/O space in the CPU's
-     memory map, particularly on those CPUs that don't support alternate I/O
-     spaces.
+     1. All readX() and writeX() accesses to the same peripheral are ordered
+        with respect to each other. For example, this ensures that MMIO register
+	writes by the CPU to a particular device will arrive in program order.
 
-     Accesses to this space may be fully synchronous (as on i386), but
-     intermediary bridges (such as the PCI host bridge) may not fully honour
-     that.
+     2. A writeX() by the CPU to the peripheral will first wait for the
+        completion of all prior CPU writes to memory. For example, this ensures
+        that writes by the CPU to an outbound DMA buffer allocated by
+        dma_alloc_coherent() will be visible to a DMA engine when the CPU writes
+        to its MMIO control register to trigger the transfer.
 
-     They are guaranteed to be fully ordered with respect to each other.
+     3. A readX() by the CPU from the peripheral will complete before any
+	subsequent CPU reads from memory can begin. For example, this ensures
+	that reads by the CPU from an incoming DMA buffer allocated by
+	dma_alloc_coherent() will not see stale data after reading from the DMA
+	engine's MMIO status register to establish that the DMA transfer has
+	completed.
 
-     They are not guaranteed to be fully ordered with respect to other types of
-     memory and I/O operation.
+     4. A readX() by the CPU from the peripheral will complete before any
+	subsequent delay() loop can begin execution. For example, this ensures
+	that two MMIO register writes by the CPU to a peripheral will arrive at
+	least 1us apart if the first write is immediately read back with readX()
+	and udelay(1) is called prior to the second writeX().
 
- (*) readX(), writeX():
+     __iomem pointers obtained with non-default attributes (e.g. those returned
+     by ioremap_wc()) are unlikely to provide many of these guarantees.
 
-     Whether these are guaranteed to be fully ordered and uncombined with
-     respect to each other on the issuing CPU depends on the characteristics
-     defined for the memory window through which they're accessing.  On later
-     i386 architecture machines, for example, this is controlled by way of the
-     MTRR registers.
+ (*) readX_relaxed(), writeX_relaxed():
 
-     Ordinarily, these will be guaranteed to be fully ordered and uncombined,
-     provided they're not accessing a prefetchable device.
+     These are similar to readX() and writeX(), but provide weaker memory
+     ordering guarantees. Specifically, they do not guarantee ordering with
+     respect to normal memory accesses or delay() loops (i.e bullets 2-4 above)
+     but they are still guaranteed to be ordered with respect to other accesses
+     to the same peripheral when operating on __iomem pointers mapped with the
+     default I/O attributes.
 
-     However, intermediary hardware (such as a PCI bridge) may indulge in
-     deferral if it so wishes; to flush a store, a load from the same location
-     is preferred[*], but a load from the same device or from configuration
-     space should suffice for PCI.
+ (*) readsX(), writesX():
 
-     [*] NOTE! attempting to load from the same location as was written to may
-	 cause a malfunction - consider the 16550 Rx/Tx serial registers for
-	 example.
+     The readsX() and writesX() MMIO accessors are designed for accessing
+     register-based, memory-mapped FIFOs residing on peripherals that are not
+     capable of performing DMA. Consequently, they provide only the ordering
+     guarantees of readX_relaxed() and writeX_relaxed(), as documented above.
 
-     Used with prefetchable I/O memory, an mmiowb() barrier may be required to
-     force stores to be ordered.
+ (*) inX(), outX():
 
-     Please refer to the PCI specification for more information on interactions
-     between PCI transactions.
+     The inX() and outX() accessors are intended to access legacy port-mapped
+     I/O peripherals, which may require special instructions on some
+     architectures (notably x86). The port number of the peripheral being
+     accessed is passed as an argument.
 
- (*) readX_relaxed(), writeX_relaxed()
+     Since many CPU architectures ultimately access these peripherals via an
+     internal virtual memory mapping, the portable ordering guarantees provided
+     by inX() and outX() are the same as those provided by readX() and writeX()
+     respectively when accessing a mapping with the default I/O attributes.
 
-     These are similar to readX() and writeX(), but provide weaker memory
-     ordering guarantees.  Specifically, they do not guarantee ordering with
-     respect to normal memory accesses (e.g. DMA buffers) nor do they guarantee
-     ordering with respect to LOCK or UNLOCK operations.  If the latter is
-     required, an mmiowb() barrier can be used.  Note that relaxed accesses to
-     the same peripheral are guaranteed to be ordered with respect to each
-     other.
+     Device drivers may expect outX() to emit a non-posted write transaction
+     that waits for a completion response from the I/O peripheral before
+     returning. This is not guaranteed by all architectures and is therefore
+     not part of the portable ordering semantics.
+
+ (*) insX(), outsX():
+
+     As above, the insX() and outX() accessors provide the same ordering
+     guarantees as readsX() and writesX() respectively when accessing a mapping
+     with the default I/O attributes.
 
  (*) ioreadX(), iowriteX()
 
      These will perform appropriately for the type of access they're actually
      doing, be it inX()/outX() or readX()/writeX().
 
+All of these accessors assume that the underlying peripheral is little-endian,
+and will therefore perform byte-swapping operations on big-endian architectures.
+
+Composing I/O ordering barriers with SMP ordering barriers and LOCK/UNLOCK
+operations is a dangerous sport which may require the use of mmiowb(). See the
+subsection "Acquires vs I/O accesses" for more information.
 
 ========================================
 ASSUMED MINIMUM EXECUTION ORDERING MODEL