Message ID | 1600085217-26245-6-git-send-email-tanhuazhong@huawei.com |
---|---|
State | New |
Headers | show |
Series | [net-next,1/6] net: hns3: batch the page reference count updates | expand |
On Mon, 14 Sep 2020 20:06:56 +0800 Huazhong Tan wrote: > From: Yunsheng Lin <linyunsheng@huawei.com> > > writel() can be used to order I/O vs memory by default when > writing portable drivers. Use writel() to replace wmb() + > writel_relaxed(), and writel() is dma_wmb() + writel_relaxed() > for ARM64, so there is an optimization here because dma_wmb() > is a lighter barrier than wmb(). Cool, although lots of drivers will need a change like this now. And looks like memory-barriers.txt is slightly, eh, not coherent there, between the documentation of writeX() and dma_wmb() :S 3. A writeX() by a CPU thread to the peripheral will first wait for the completion of all prior writes to memory either issued by, or propagated to, the same thread. This ensures that writes by the CPU to an outbound DMA buffer allocated by dma_alloc_coherent() will be visible to a DMA engine when the CPU writes to its MMIO control register to trigger the transfer. (*) dma_wmb(); (*) dma_rmb(); These are for use with consistent memory to guarantee the ordering of writes or reads of shared memory accessible to both the CPU and a DMA capable device. For example, consider a device driver that shares memory with a device and uses a descriptor status value to indicate if the descriptor belongs to the device or the CPU, and a doorbell to notify it when new descriptors are available: if (desc->status != DEVICE_OWN) { /* do not read data until we own descriptor */ dma_rmb(); /* read/modify data */ read_data = desc->data; desc->data = write_data; /* flush modifications before status update */ dma_wmb(); /* assign ownership */ desc->status = DEVICE_OWN; /* notify device of new descriptors */ writel(DESC_NOTIFY, doorbell); } The dma_rmb() allows us guarantee the device has released ownership before we read the data from the descriptor, and the dma_wmb() allows us to guarantee the data is written to the descriptor before the device can see it now has ownership. Note that, when using writel(), a prior wmb() is not needed to guarantee that the cache coherent memory writes have completed before writing to the MMIO region. The cheaper writel_relaxed() does not provide this guarantee and must not be used here.
On 2020/9/15 5:45, Jakub Kicinski wrote: > On Mon, 14 Sep 2020 20:06:56 +0800 Huazhong Tan wrote: >> From: Yunsheng Lin <linyunsheng@huawei.com> >> >> writel() can be used to order I/O vs memory by default when >> writing portable drivers. Use writel() to replace wmb() + >> writel_relaxed(), and writel() is dma_wmb() + writel_relaxed() >> for ARM64, so there is an optimization here because dma_wmb() >> is a lighter barrier than wmb(). > > Cool, although lots of drivers will need a change like this now. > > And looks like memory-barriers.txt is slightly, eh, not coherent there, > between the documentation of writeX() and dma_wmb() :S > > 3. A writeX() by a CPU thread to the peripheral will first wait for the > completion of all prior writes to memory either issued by, or "wait for the completion of all prior writes to memory" seems to match the semantics of writel() here? > propagated to, the same thread. This ensures that writes by the CPU > to an outbound DMA buffer allocated by dma_alloc_coherent() will be "outbound DMA buffer" mapped by the streaming API can also be ordered by the writel(), Is that what you meant by "not coherent"? > visible to a DMA engine when the CPU writes to its MMIO control > register to trigger the transfer. > > > > (*) dma_wmb(); > (*) dma_rmb(); > > These are for use with consistent memory to guarantee the ordering > of writes or reads of shared memory accessible to both the CPU and a > DMA capable device. > > For example, consider a device driver that shares memory with a device > and uses a descriptor status value to indicate if the descriptor belongs > to the device or the CPU, and a doorbell to notify it when new > descriptors are available: > > if (desc->status != DEVICE_OWN) { > /* do not read data until we own descriptor */ > dma_rmb(); > > /* read/modify data */ > read_data = desc->data; > desc->data = write_data; > > /* flush modifications before status update */ > dma_wmb(); > > /* assign ownership */ > desc->status = DEVICE_OWN; > > /* notify device of new descriptors */ > writel(DESC_NOTIFY, doorbell); > } > > The dma_rmb() allows us guarantee the device has released ownership > before we read the data from the descriptor, and the dma_wmb() allows > us to guarantee the data is written to the descriptor before the device > can see it now has ownership. Note that, when using writel(), a prior > wmb() is not needed to guarantee that the cache coherent memory writes > have completed before writing to the MMIO region. The cheaper > writel_relaxed() does not provide this guarantee and must not be used > here. I am not sure writel() has any implication here. My interpretation to the above doc is that dma_wmb() is more appropriate when only coherent/consistent memory need to be ordered. If writel() is used, then dma_wmb() or wmb() is unnecessary, see: commit: 5846581e3563 ("locking/memory-barriers.txt: Fix broken DMA vs. MMIO ordering example") > . >
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index 8490754..4a49a76 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -1398,9 +1398,8 @@ static void hns3_tx_doorbell(struct hns3_enet_ring *ring, int num, if (!ring->pending_buf) return; - wmb(); /* Commit all data before submit */ - - hnae3_queue_xmit(ring->tqp, ring->pending_buf); + writel(ring->pending_buf, + ring->tqp->io_base + HNS3_RING_TX_RING_TAIL_REG); ring->pending_buf = 0; WRITE_ONCE(ring->last_to_use, ring->next_to_use); } @@ -2618,8 +2617,7 @@ static void hns3_nic_alloc_rx_buffers(struct hns3_enet_ring *ring, ring_ptr_move_fw(ring, next_to_use); } - wmb(); /* Make all data has been write before submit */ - writel_relaxed(i, ring->tqp->io_base + HNS3_RING_RX_RING_HEAD_REG); + writel(i, ring->tqp->io_base + HNS3_RING_RX_RING_HEAD_REG); } static bool hns3_page_is_reusable(struct page *page) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h index 876dc09..20e62ce 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h @@ -539,9 +539,6 @@ static inline bool hns3_nic_resetting(struct net_device *netdev) #define hns3_write_dev(a, reg, value) \ hns3_write_reg((a)->io_base, (reg), (value)) -#define hnae3_queue_xmit(tqp, buf_num) writel_relaxed(buf_num, \ - (tqp)->io_base + HNS3_RING_TX_RING_TAIL_REG) - #define ring_to_dev(ring) ((ring)->dev) #define ring_to_netdev(ring) ((ring)->tqp_vector->napi.dev)