mbox series

[net-next,v8,0/3] net: axienet: Introduce dmaengine

Message ID 1697570274-798170-1-git-send-email-radhey.shyam.pandey@amd.com
Headers show
Series net: axienet: Introduce dmaengine | expand

Message

Pandey, Radhey Shyam Oct. 17, 2023, 7:17 p.m. UTC
The axiethernet driver can use the dmaengine framework to communicate
with the xilinx DMAengine driver(AXIDMA, MCDMA). The inspiration behind
this dmaengine adoption is to reuse the in-kernel xilinx dma engine
driver[1] and remove redundant dma programming sequence[2] from the
ethernet driver. This simplifies the ethernet driver and also makes
it generic to be hooked to any complaint dma IP i.e AXIDMA, MCDMA
without any modification.

The dmaengine framework was extended for metadata API support during
the axidma RFC[3] discussion. However, it still needs further
enhancements to make it well suited for ethernet usecases.

Comments, suggestions, thoughts to implement remaining functional
features are very welcome!

[1]: https://github.com/torvalds/linux/blob/master/drivers/dma/xilinx/xilinx_dma.c
[2]: https://github.com/torvalds/linux/blob/master/drivers/net/ethernet/xilinx/xilinx_axienet_main.c#L238
[3]: http://lkml.iu.edu/hypermail/linux/kernel/1804.0/00367.html
[4]: https://lore.kernel.org/all/20221124102745.2620370-1-sarath.babu.naidu.gaddam@amd.com


Changes for v8:
- Use dev_consume_skb_any() in transmit callback.
- Fix queue stop logic in _xmit_dmaengine().
- Fix skb leak in _xmit_dmaengine() error path.
- In dmaengine tx path use dma_device pointer to call prep_slave_sg.
- In rx submit use rate limiting for mapping errors and fix error
  handling.
- Revert dev_err_probe from _init_dmaengine().
- Remove unnecessary new line after call to _init_dmaengine().
- Move free_irq(lp->eth_irq) to error path.
- Have separate netdev_ops for dmaengine flow.
- Remove spurious new line and inline attribute from axienet_init_legacy_dma().
- Drop indirection for xmit() func. For dmaengine flow have separate
  netdev_ops.
- Improve axienet_dma_tx_cb and axienet_rx_submit_desc documentation.

Changes in v7:
- Fix comment spaces.
- In xmit use correct XAE_FEATURE_PARTIAL_TX_CSUM define.
- Rename app to app_metadata.
- Switch to __netif_rx.
- In axienet_rx_submit_desc() add mapping error handling.
- Introduce new workaround 4/4 patch to enable lkp_test
  build coverage. To be dropped for mainline.

Changes in v6:
- Remove patchset 1-7 as it was applied to dmaengine tree in v5 version.
- Added Krzysztof reviewed-by tag for dmaengine binding patch.
- Rename struct axi_skbuff to skbuf_dma_descriptor and removed
  __packed attribute.
- Drop kmem_cache implementation and switch to using ring buffers.
- Remove __inline from axienet_init_dmaengine().
- Name labels after the target.
- Add error check for platform_get_irq_optional().
- Fix double space and no empty lines between call and its error check.

Changes in v5:
- Fix git am failure on net-next
- Addressed DT binding review comments i.e Modified commit description to
  remove dmaengine framework references and instead describe how
  axiethernet IP uses DMA channels.
- Fix "^[tr]x_chan[0-9]|1[0-5]$" -> "^[tr]x_chan([0-9]|1[0-5])$"
- Drop generic dmas description.
- Fix kmem_cache resource leak.
- Merge Xilinx DMA enhancements and optimization[4] into this series.

Changes in V4:
- Updated commit description about tx/rx channels name(1/3).
- Removed "dt-bindings" and "dmaengine" strings in subject(1/3).
- Extended dmas and dma-names to support MCDMA channel names(1/3).
- Rename has_dmas to use_dmaegine(2/3).
- Remove the AXIENET_USE_DMA(2/3).
- Remove the AXIENET_USE_DMA(3/3).
- Add dev_err_probe for dma_request_chan error handling(3/3).
- Add kmem_cache_destroy for create in axienet_setup_dma_chan(3/3).

Changes in V3:
- Moved RFC to PATCH.
- Removed ethtool get/set coalesce, will be added later.
- Added backward comapatible support.
- Split the dmaengine support patch of V2 into two patches(2/3 and 3/3).
https://lore.kernel.org/all/20220920055703.13246-4-sarath.babu.naidu.gaddam@amd.com/

Changes in V2:
- Add ethtool get/set coalesce and DMA reset using DMAengine framework.
- Add performance numbers.
- Remove .txt and change the name of file to xlnx,axiethernet.yaml.
- Fix DT check warning(Fix DT check warning('device_type' does not match
   any of the regexes:'pinctrl-[0-9]+' From schema: Documentation/
   devicetree/bindings/net/xilinx_axienet.yaml).

Radhey Shyam Pandey (2):
  dt-bindings: net: xlnx,axi-ethernet: Introduce DMA support
  net: axienet: Introduce dmaengine support

Sarath Babu Naidu Gaddam (1):
  net: axienet: Preparatory changes for dmaengine support

 .../bindings/net/xlnx,axi-ethernet.yaml       |  16 +
 drivers/net/ethernet/xilinx/Kconfig           |   1 +
 drivers/net/ethernet/xilinx/xilinx_axienet.h  |  35 +
 .../net/ethernet/xilinx/xilinx_axienet_main.c | 666 ++++++++++++++----
 4 files changed, 597 insertions(+), 121 deletions(-)

Comments

Pandey, Radhey Shyam Oct. 20, 2023, 7:27 p.m. UTC | #1
<snip>
> Minor nit: please use the reverse x-mas tree order

Sure, will fix it in next version.

<snip>
> You can avoid some duplicate code with:
> 		goto drop_skb;
> 
> and adding at the bottom of this function:
> 
> drop_skb:
> 	dev_kfree_skb_any(skb);
> 	return NETDEV_TX_OK;
> 

Agree , will switch to it in next version.

<snip>

> You forgot to add the netif_txq_maybe_stop() call, as suggested by Jakub in
> the previous revision.

I was in an impression that these are multi queue specific APIs, so I skipped them.
But revisited the implementation and it seems clear now, and modified the driver 
to use these lockless queue stopping / waking helpers.

+       tax = skb_get_tx_queue(lp->ndev, skb);
+       netdev_tx_sent_queue(txq, skb->len);
+       netif_txq_maybe_stop(txq, CIRC_SPACE(lp->tx_ring_head, lp->tx_ring_tail,
+                            TX_BD_NUM_MAX), MAX_SKB_FRAGS + 1, 2 * MAX_SKB_FRAGS);


However, in netperf benchmark (TCP TX) I am seeing a dip in performance
(~35-40Mbps) when switching to these stop/wake helpers. Is it expected
considering extra logic in maintaining dynamic queue and these helpers?

Also, in throughput benchmarking there was no occurrence when the 
queue was stopped/woken up.


Throughput: (10^6bits/sec)
915.55 (v8 - without using lockless queue stop/wake helpers)

======
Switch to lockless queue stop/wake helpers

# netperf -H 192.168.10.20 -t TCP_STREAM
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.10.20 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

131072  16384  16384    10.02     876.94


> 
> > +
> > +	skbuf_dma->skb = skb;
> > +	skbuf_dma->sg_len = sg_len;
> > +	dma_tx_desc->callback_param = lp;
> > +	dma_tx_desc->callback_result = axienet_dma_tx_cb;
> > +	dmaengine_submit(dma_tx_desc);
> > +	dma_async_issue_pending(lp->tx_chan);
> > +
> > +	return NETDEV_TX_OK;
> > +
> > +xmit_error_unmap_sg:
> > +	dma_unmap_sg(lp->dev, skbuf_dma->sgl, sg_len, DMA_TO_DEVICE);
> 
> If you need to drop the skb (as I suspect), you can reuse the drop_skb label
> here:
> 
> drop_skb:
> 	dev_kfree_skb_any(skb);
> > +	return NETDEV_TX_OK;

Yes , will make this change and this will also fix skb leak.

> > +}
> > +
> >  /**
> >   * axienet_tx_poll - Invoked once a transmit is completed by the
> >   * Axi DMA Tx channel.
> > @@ -893,6 +1028,42 @@ axienet_start_xmit(struct sk_buff *skb, struct
> net_device *ndev)
> >  	return NETDEV_TX_OK;
> >  }
> >
> > +/**
> > + * axienet_dma_rx_cb - DMA engine callback for RX channel.
> > + * @data:       Pointer to the skbuf_dma_descriptor structure.
> > + * @result:     error reporting through dmaengine_result.
> > + * This function is called by dmaengine driver for RX channel to
> > +notify
> > + * that the packet is received.
> > + */
> > +static void axienet_dma_rx_cb(void *data, const struct
> > +dmaengine_result *result) {
> > +	struct axienet_local *lp = data;
> > +	struct skbuf_dma_descriptor *skbuf_dma;
> > +	size_t meta_len, meta_max_len, rx_len;
> > +	struct sk_buff *skb;
> > +	u32 *app_metadata;
> 
> Minor nit: please respect the reverse x-mas tree order

Yes, will fix it in next version.

> 
> > +
> > +	skbuf_dma = axienet_get_rx_desc(lp, lp->rx_ring_tail++);
> > +	skb = skbuf_dma->skb;
> > +	app_metadata = dmaengine_desc_get_metadata_ptr(skbuf_dma-
> >desc, &meta_len,
> > +						       &meta_max_len);
> > +	dma_unmap_single(lp->dev, skbuf_dma->dma_address, lp-
> >max_frm_size,
> > +			 DMA_FROM_DEVICE);
> > +	/* TODO: Derive app word index programmatically */
> > +	rx_len = (app_metadata[LEN_APP] & 0xFFFF);
> > +	skb_put(skb, rx_len);
> > +	skb->protocol = eth_type_trans(skb, lp->ndev);
> > +	skb->ip_summed = CHECKSUM_NONE;
> > +
> > +	__netif_rx(skb);
> 
> It's a pity you can't leverage NAPI here.
> 
> I think that could be doable as a follow-up, but I'm unsure if that would fit
> the DMA engine model: in this callback you could cache the ready dma index
> (a single range should suffice) and schedule the napi instance. The actual
> dma processing will be done in napi poll.
> 
> Another possible follow-up could be introducing a "bulk" RX callback in the
> DMA engine, to mitigate the indirect call overhead on a burst of RX DMA
> completion - assuming the DMA engine actually generates such burst.

Agree , these are possible thoughts and will start working on it once 
this baseline dmaengine support series is done.  
> 
> > +	u64_stats_update_begin(&lp->rx_stat_sync);
> > +	u64_stats_add(&lp->rx_packets, 1);
> > +	u64_stats_add(&lp->rx_bytes, rx_len);
> > +	u64_stats_update_end(&lp->rx_stat_sync);
> > +	axienet_rx_submit_desc(lp->ndev);
> > +	dma_async_issue_pending(lp->rx_chan);
> > +}
> > +
> >  /**
> >   * axienet_rx_poll - Triggered by RX ISR to complete the BD processing.
> >   * @napi:	Pointer to NAPI structure.
> > @@ -1126,6 +1297,150 @@ static irqreturn_t axienet_eth_irq(int irq,
> > void *_ndev)
> >
> >  static void axienet_dma_err_handler(struct work_struct *work);
> >
> > +/**
> > + * axienet_rx_submit_desc - Submit the rx descriptors to dmaengine.
> > + * allocate skbuff, map the scatterlist and obtain a descriptor
> > + * and then add the callback information and submit descriptor.
> > + *
> > + * @ndev:	net_device pointer
> > + *
> > + *Return: 0, on success.
> > + *          non-zero error value on failure
> > + */
> > +static int axienet_rx_submit_desc(struct net_device *ndev) {
> > +	struct dma_async_tx_descriptor *dma_rx_desc = NULL;
> > +	struct axienet_local *lp = netdev_priv(ndev);
> > +	struct skbuf_dma_descriptor *skbuf_dma;
> > +	struct sk_buff *skb;
> > +	dma_addr_t addr;
> > +	int ret;
> > +
> > +	skbuf_dma = axienet_get_rx_desc(lp, lp->rx_ring_head);
> > +	if (!skbuf_dma)
> > +		return -ENOMEM;
> 
> Minor nit: here a newline would make the core more readable

Accepted , will add in next version.

> 
> > +	lp->rx_ring_head++;
> > +	skb = netdev_alloc_skb(ndev, lp->max_frm_size);
> > +	if (!skb)
> > +		return -ENOMEM;
> 
> Another possible follow-up: usually the skb header is allocated just before
> sending it to the network stack (e.g. just before the
> __netif_rx() call) to be cache friendly. Here you could allocate just the data
> part and later use e.g. build_skb_around()

Sure, will explore on it.
> 
> > +
> > +	sg_init_table(skbuf_dma->sgl, 1);
> > +	addr = dma_map_single(lp->dev, skb->data, lp->max_frm_size,
> DMA_FROM_DEVICE);
> > +	if (unlikely(dma_mapping_error(lp->dev, addr))) {
> > +		if (net_ratelimit())
> > +			netdev_err(ndev, "DMA mapping error\n");
> > +		ret = -ENOMEM;
> > +		goto rx_submit_err_free_skb;
> > +	}
> > +	sg_dma_address(skbuf_dma->sgl) = addr;
> > +	sg_dma_len(skbuf_dma->sgl) = lp->max_frm_size;
> > +	dma_rx_desc = dmaengine_prep_slave_sg(lp->rx_chan, skbuf_dma-
> >sgl,
> > +					      1, DMA_DEV_TO_MEM,
> > +					      DMA_PREP_INTERRUPT);
> > +	if (!dma_rx_desc) {
> > +		ret = -EINVAL;
> > +		goto rx_submit_err_unmap_skb;
> > +	}
> > +
> > +	skbuf_dma->skb = skb;
> > +	skbuf_dma->dma_address = sg_dma_address(skbuf_dma->sgl);
> > +	skbuf_dma->desc = dma_rx_desc;
> > +	dma_rx_desc->callback_param = lp;
> > +	dma_rx_desc->callback_result = axienet_dma_rx_cb;
> > +	dmaengine_submit(dma_rx_desc);
> > +
> > +	return 0;
> > +
> > +rx_submit_err_unmap_skb:
> > +	dma_unmap_single(lp->dev, addr, lp->max_frm_size,
> DMA_FROM_DEVICE);
> > +rx_submit_err_free_skb:
> > +	dev_kfree_skb(skb);
> > +	return ret;
> 
> It looks like the error code is ignored by the caller. Possibly you can change
> this to a 'void' function.

will make it void in next version.

Thanks,
Radhey