net: ftgmac100: Ensure tx descriptor updates are visible

Message ID 20201020220639.130696-1-joel@jms.id.au
State New
Headers show
Series
  • net: ftgmac100: Ensure tx descriptor updates are visible
Related show

Commit Message

Joel Stanley Oct. 20, 2020, 10:06 p.m.
We must ensure the tx descriptor updates are visible before updating
the tx pointer.

This resolves the tx hangs observed on the 2600 when running iperf:

root@ast2600:~# iperf3 -c 192.168.86.146 -R
Connecting to host 192.168.86.146, port 5201
Reverse mode, remote host 192.168.86.146 is sending
[  5] local 192.168.86.173 port 43886 connected to 192.168.86.146 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  90.7 MBytes   760 Mbits/sec
[  5]   1.00-2.00   sec  91.7 MBytes   769 Mbits/sec
[  5]   2.00-3.00   sec  91.7 MBytes   770 Mbits/sec
[  5]   3.00-4.00   sec  91.7 MBytes   769 Mbits/sec
[  5]   4.00-5.00   sec  91.8 MBytes   771 Mbits/sec
[  5]   5.00-6.00   sec  91.8 MBytes   771 Mbits/sec
[  5]   6.00-7.00   sec  91.9 MBytes   771 Mbits/sec
[  5]   7.00-8.00   sec  91.4 MBytes   767 Mbits/sec
[  5]   8.00-9.00   sec  91.3 MBytes   766 Mbits/sec
[  5]   9.00-10.00  sec  91.9 MBytes   771 Mbits/sec
[  5]  10.00-11.00  sec  91.8 MBytes   770 Mbits/sec
[  5]  11.00-12.00  sec  91.8 MBytes   770 Mbits/sec
[  5]  12.00-13.00  sec  90.6 MBytes   761 Mbits/sec
[  5]  13.00-14.00  sec  45.2 KBytes   370 Kbits/sec
[  5]  14.00-15.00  sec  0.00 Bytes  0.00 bits/sec
[  5]  15.00-16.00  sec  0.00 Bytes  0.00 bits/sec
[  5]  16.00-17.00  sec  0.00 Bytes  0.00 bits/sec
[  5]  17.00-18.00  sec  0.00 Bytes  0.00 bits/sec
[   67.031671] ------------[ cut here ]------------
[   67.036870] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x2dc/0x300
[   67.046123] NETDEV WATCHDOG: eth2 (ftgmac100): transmit queue 0 timed out

Fixes: 52c0cae87465 ("ftgmac100: Remove tx descriptor accessors")
Signed-off-by: Joel Stanley <joel@jms.id.au>
---
 drivers/net/ethernet/faraday/ftgmac100.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Benjamin Herrenschmidt Oct. 21, 2020, midnight | #1
On Wed, 2020-10-21 at 08:36 +1030, Joel Stanley wrote:
> We must ensure the tx descriptor updates are visible before updating
> the tx pointer.
> 
> This resolves the tx hangs observed on the 2600 when running iperf:

To clarify the comment here. This doesn't ensure they are visible to
the hardware but to other CPUs. This is the ordering vs start_xmit and
tx_complete.

Cheers,
Ben.

> root@ast2600:~# iperf3 -c 192.168.86.146 -R
> Connecting to host 192.168.86.146, port 5201
> Reverse mode, remote host 192.168.86.146 is sending
> [  5] local 192.168.86.173 port 43886 connected to 192.168.86.146
> port 5201
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-1.00   sec  90.7 MBytes   760 Mbits/sec
> [  5]   1.00-2.00   sec  91.7 MBytes   769 Mbits/sec
> [  5]   2.00-3.00   sec  91.7 MBytes   770 Mbits/sec
> [  5]   3.00-4.00   sec  91.7 MBytes   769 Mbits/sec
> [  5]   4.00-5.00   sec  91.8 MBytes   771 Mbits/sec
> [  5]   5.00-6.00   sec  91.8 MBytes   771 Mbits/sec
> [  5]   6.00-7.00   sec  91.9 MBytes   771 Mbits/sec
> [  5]   7.00-8.00   sec  91.4 MBytes   767 Mbits/sec
> [  5]   8.00-9.00   sec  91.3 MBytes   766 Mbits/sec
> [  5]   9.00-10.00  sec  91.9 MBytes   771 Mbits/sec
> [  5]  10.00-11.00  sec  91.8 MBytes   770 Mbits/sec
> [  5]  11.00-12.00  sec  91.8 MBytes   770 Mbits/sec
> [  5]  12.00-13.00  sec  90.6 MBytes   761 Mbits/sec
> [  5]  13.00-14.00  sec  45.2 KBytes   370 Kbits/sec
> [  5]  14.00-15.00  sec  0.00 Bytes  0.00 bits/sec
> [  5]  15.00-16.00  sec  0.00 Bytes  0.00 bits/sec
> [  5]  16.00-17.00  sec  0.00 Bytes  0.00 bits/sec
> [  5]  17.00-18.00  sec  0.00 Bytes  0.00 bits/sec
> [   67.031671] ------------[ cut here ]------------
> [   67.036870] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:442
> dev_watchdog+0x2dc/0x300
> [   67.046123] NETDEV WATCHDOG: eth2 (ftgmac100): transmit queue 0
> timed out
> 
> Fixes: 52c0cae87465 ("ftgmac100: Remove tx descriptor accessors")
> Signed-off-by: Joel Stanley <joel@jms.id.au>
> ---
>  drivers/net/ethernet/faraday/ftgmac100.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c
> b/drivers/net/ethernet/faraday/ftgmac100.c
> index 331d4bdd4a67..15cdfeb135b0 100644
> --- a/drivers/net/ethernet/faraday/ftgmac100.c
> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> @@ -653,6 +653,11 @@ static bool ftgmac100_tx_complete_packet(struct
> ftgmac100 *priv)
>  	ftgmac100_free_tx_packet(priv, pointer, skb, txdes, ctl_stat);
>  	txdes->txdes0 = cpu_to_le32(ctl_stat & priv-
> >txdes0_edotr_mask);
>  
> +	/* Ensure the descriptor config is visible before setting the
> tx
> +	 * pointer.
> +	 */
> +	smp_wmb();
> +
>  	priv->tx_clean_pointer = ftgmac100_next_tx_pointer(priv,
> pointer);
>  
>  	return true;
> @@ -806,6 +811,11 @@ static netdev_tx_t
> ftgmac100_hard_start_xmit(struct sk_buff *skb,
>  	dma_wmb();
>  	first->txdes0 = cpu_to_le32(f_ctl_stat);
>  
> +	/* Ensure the descriptor config is visible before setting the
> tx
> +	 * pointer.
> +	 */
> +	smp_wmb();
> +
>  	/* Update next TX pointer */
>  	priv->tx_pointer = pointer;
>
Joel Stanley Oct. 21, 2020, 3:37 a.m. | #2
On Wed, 21 Oct 2020 at 00:00, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> On Wed, 2020-10-21 at 08:36 +1030, Joel Stanley wrote:
> > We must ensure the tx descriptor updates are visible before updating
> > the tx pointer.
> >
> > This resolves the tx hangs observed on the 2600 when running iperf:
>
> To clarify the comment here. This doesn't ensure they are visible to
> the hardware but to other CPUs. This is the ordering vs start_xmit and
> tx_complete.

Thanks. Let me know if this makes sense, or if I'm completely off the mark.

How is this for the commit message:

This resolves the tx hangs observed on the 2600 when running iperf.
This is ensuring the setting of the OWN bit in txdes0 of the
descriptor is visible to other CPUs before updating the pointer. Doing
this provides ordering between start_xmit and tx_complete.

and then I'll put:

        /* Ensure the descriptor config is visible to other CPUs before setting
         * the tx pointer. This ensures ordering against start_xmit which checks
          * the OWN bit before proceeding.
         */

and similar for tx_complete?

>
> Cheers,
> Ben.
>
> > root@ast2600:~# iperf3 -c 192.168.86.146 -R
> > Connecting to host 192.168.86.146, port 5201
> > Reverse mode, remote host 192.168.86.146 is sending
> > [  5] local 192.168.86.173 port 43886 connected to 192.168.86.146
> > port 5201
> > [ ID] Interval           Transfer     Bitrate
> > [  5]   0.00-1.00   sec  90.7 MBytes   760 Mbits/sec
> > [  5]   1.00-2.00   sec  91.7 MBytes   769 Mbits/sec
> > [  5]   2.00-3.00   sec  91.7 MBytes   770 Mbits/sec
> > [  5]   3.00-4.00   sec  91.7 MBytes   769 Mbits/sec
> > [  5]   4.00-5.00   sec  91.8 MBytes   771 Mbits/sec
> > [  5]   5.00-6.00   sec  91.8 MBytes   771 Mbits/sec
> > [  5]   6.00-7.00   sec  91.9 MBytes   771 Mbits/sec
> > [  5]   7.00-8.00   sec  91.4 MBytes   767 Mbits/sec
> > [  5]   8.00-9.00   sec  91.3 MBytes   766 Mbits/sec
> > [  5]   9.00-10.00  sec  91.9 MBytes   771 Mbits/sec
> > [  5]  10.00-11.00  sec  91.8 MBytes   770 Mbits/sec
> > [  5]  11.00-12.00  sec  91.8 MBytes   770 Mbits/sec
> > [  5]  12.00-13.00  sec  90.6 MBytes   761 Mbits/sec
> > [  5]  13.00-14.00  sec  45.2 KBytes   370 Kbits/sec
> > [  5]  14.00-15.00  sec  0.00 Bytes  0.00 bits/sec
> > [  5]  15.00-16.00  sec  0.00 Bytes  0.00 bits/sec
> > [  5]  16.00-17.00  sec  0.00 Bytes  0.00 bits/sec
> > [  5]  17.00-18.00  sec  0.00 Bytes  0.00 bits/sec
> > [   67.031671] ------------[ cut here ]------------
> > [   67.036870] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:442
> > dev_watchdog+0x2dc/0x300
> > [   67.046123] NETDEV WATCHDOG: eth2 (ftgmac100): transmit queue 0
> > timed out
> >
> > Fixes: 52c0cae87465 ("ftgmac100: Remove tx descriptor accessors")
> > Signed-off-by: Joel Stanley <joel@jms.id.au>
> > ---
> >  drivers/net/ethernet/faraday/ftgmac100.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c
> > b/drivers/net/ethernet/faraday/ftgmac100.c
> > index 331d4bdd4a67..15cdfeb135b0 100644
> > --- a/drivers/net/ethernet/faraday/ftgmac100.c
> > +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> > @@ -653,6 +653,11 @@ static bool ftgmac100_tx_complete_packet(struct
> > ftgmac100 *priv)
> >       ftgmac100_free_tx_packet(priv, pointer, skb, txdes, ctl_stat);
> >       txdes->txdes0 = cpu_to_le32(ctl_stat & priv-
> > >txdes0_edotr_mask);
> >
> > +     /* Ensure the descriptor config is visible before setting the
> > tx
> > +      * pointer.
> > +      */
> > +     smp_wmb();
> > +
> >       priv->tx_clean_pointer = ftgmac100_next_tx_pointer(priv,
> > pointer);
> >
> >       return true;
> > @@ -806,6 +811,11 @@ static netdev_tx_t
> > ftgmac100_hard_start_xmit(struct sk_buff *skb,
> >       dma_wmb();
> >       first->txdes0 = cpu_to_le32(f_ctl_stat);
> >
> > +     /* Ensure the descriptor config is visible before setting the
> > tx
> > +      * pointer.
> > +      */
> > +     smp_wmb();
> > +
> >       /* Update next TX pointer */
> >       priv->tx_pointer = pointer;
> >
>
David Laight Oct. 21, 2020, 8:18 a.m. | #3
From: Benjamin Herrenschmidt

> Sent: 21 October 2020 01:00

> 

> On Wed, 2020-10-21 at 08:36 +1030, Joel Stanley wrote:

> > We must ensure the tx descriptor updates are visible before updating

> > the tx pointer.

> >

> > This resolves the tx hangs observed on the 2600 when running iperf:

> 

> To clarify the comment here. This doesn't ensure they are visible to

> the hardware but to other CPUs. This is the ordering vs start_xmit and

> tx_complete.


You need two barriers.
1) after making the data buffers available before transferring
the descriptor ownership to the device.
2) after transferring the ownership before 'kicking' the mac engine.

The first is needed because the mac engine can poll the descriptors
at any time (eg on completing the previous transmit).
This stops it transmitting garbage.

The second makes sure it finds the descriptor you've just set.
This stops delays before sending the packet.
(But it will get sent later.)

For (2) dma_wmb() is the documented barrier.

I'm not sure which barrier you need for (1).
smp_wmb() would be right if the reader were another cpu,
but it is (at most) a compile barrier on UP kernels.
So you need something stronger than smp_wmb().
On a TSO system (which yours probably is) a compile barrier
is probably sufficient, but if memory writes can get re-ordered
it needs to be a stronger barrier - but not necessarily as strong
as dma_wmb().

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Arnd Bergmann Oct. 21, 2020, 12:40 p.m. | #4
On Wed, Oct 21, 2020 at 12:39 PM Joel Stanley <joel@jms.id.au> wrote:

>

> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c

> index 331d4bdd4a67..15cdfeb135b0 100644

> --- a/drivers/net/ethernet/faraday/ftgmac100.c

> +++ b/drivers/net/ethernet/faraday/ftgmac100.c

> @@ -653,6 +653,11 @@ static bool ftgmac100_tx_complete_packet(struct ftgmac100 *priv)

>         ftgmac100_free_tx_packet(priv, pointer, skb, txdes, ctl_stat);

>         txdes->txdes0 = cpu_to_le32(ctl_stat & priv->txdes0_edotr_mask);

>

> +       /* Ensure the descriptor config is visible before setting the tx

> +        * pointer.

> +        */

> +       smp_wmb();

> +

>         priv->tx_clean_pointer = ftgmac100_next_tx_pointer(priv, pointer);

>

>         return true;

> @@ -806,6 +811,11 @@ static netdev_tx_t ftgmac100_hard_start_xmit(struct sk_buff *skb,

>         dma_wmb();

>         first->txdes0 = cpu_to_le32(f_ctl_stat);

>

> +       /* Ensure the descriptor config is visible before setting the tx

> +        * pointer.

> +        */

> +       smp_wmb();

> +


Shouldn't these be paired with smp_rmb() on the reader side?

      Arnd
Joel Stanley Oct. 22, 2020, 6:35 a.m. | #5
On Wed, 21 Oct 2020 at 12:40, Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Wed, Oct 21, 2020 at 12:39 PM Joel Stanley <joel@jms.id.au> wrote:
>
> >
> > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
> > index 331d4bdd4a67..15cdfeb135b0 100644
> > --- a/drivers/net/ethernet/faraday/ftgmac100.c
> > +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> > @@ -653,6 +653,11 @@ static bool ftgmac100_tx_complete_packet(struct ftgmac100 *priv)
> >         ftgmac100_free_tx_packet(priv, pointer, skb, txdes, ctl_stat);
> >         txdes->txdes0 = cpu_to_le32(ctl_stat & priv->txdes0_edotr_mask);
> >
> > +       /* Ensure the descriptor config is visible before setting the tx
> > +        * pointer.
> > +        */
> > +       smp_wmb();
> > +
> >         priv->tx_clean_pointer = ftgmac100_next_tx_pointer(priv, pointer);
> >
> >         return true;
> > @@ -806,6 +811,11 @@ static netdev_tx_t ftgmac100_hard_start_xmit(struct sk_buff *skb,
> >         dma_wmb();
> >         first->txdes0 = cpu_to_le32(f_ctl_stat);
> >
> > +       /* Ensure the descriptor config is visible before setting the tx
> > +        * pointer.
> > +        */
> > +       smp_wmb();
> > +
>
> Shouldn't these be paired with smp_rmb() on the reader side?

Now that I've read memory-barriers.txt, yes, they should.

On my clarification of the description of the patch, I thought it was
about making sure that the txdes0 store (and   thus setting of the
ownership bit) happens before the reader side checks that bit.

But we're not, we're making sure all of the writes to the descriptor
happen before updating the pointer, so when the reader side loads the
ownership bit in txdes0, it sees that store to txdes0 at that pointer.

I'll add in the read side barrier(s) and send a v2.

Cheers,

Joel
Benjamin Herrenschmidt Oct. 22, 2020, 7:37 a.m. | #6
On Wed, 2020-10-21 at 08:18 +0000, David Laight wrote:
> From: Benjamin Herrenschmidt

> > Sent: 21 October 2020 01:00

> > 

> > On Wed, 2020-10-21 at 08:36 +1030, Joel Stanley wrote:

> > > We must ensure the tx descriptor updates are visible before updating

> > > the tx pointer.

> > > 

> > > This resolves the tx hangs observed on the 2600 when running iperf:

> > 

> > To clarify the comment here. This doesn't ensure they are visible to

> > the hardware but to other CPUs. This is the ordering vs start_xmit and

> > tx_complete.

> 

> You need two barriers.

> 1) after making the data buffers available before transferring

> the descriptor ownership to the device.

> 2) after transferring the ownership before 'kicking' the mac engine.

> 

> The first is needed because the mac engine can poll the descriptors

> at any time (eg on completing the previous transmit).

> This stops it transmitting garbage.

> 

> The second makes sure it finds the descriptor you've just set.

> This stops delays before sending the packet.

> (But it will get sent later.)


The above is unrelated to this patch. This isn't about fixing any
device <-> CPU ordering or interaction but purely about ensuring proper
ordering between start_xmit and tx packet cleanup. IE. We are looking
at two different issues with this driver.

> For (2) dma_wmb() is the documented barrier.

> 

> I'm not sure which barrier you need for (1).

> smp_wmb() would be right if the reader were another cpu,

> but it is (at most) a compile barrier on UP kernels.

> So you need something stronger than smp_wmb().


There should already be sufficient barriers for that in the driver
(except for the HW bug mentioned earlier).

> On a TSO system (which yours probably is) a compile barrier

> is probably sufficient, but if memory writes can get re-ordered

> it needs to be a stronger barrier - but not necessarily as strong

> as dma_wmb().

> 

> 	David

> 

> -

> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK

> Registration No: 1397386 (Wales)
Benjamin Herrenschmidt Oct. 22, 2020, 7:41 a.m. | #7
On Wed, 2020-10-21 at 14:40 +0200, Arnd Bergmann wrote:
> On Wed, Oct 21, 2020 at 12:39 PM Joel Stanley <joel@jms.id.au> wrote:

> 

> > 

> > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c

> > index 331d4bdd4a67..15cdfeb135b0 100644

> > --- a/drivers/net/ethernet/faraday/ftgmac100.c

> > +++ b/drivers/net/ethernet/faraday/ftgmac100.c

> > @@ -653,6 +653,11 @@ static bool ftgmac100_tx_complete_packet(struct ftgmac100 *priv)

> >         ftgmac100_free_tx_packet(priv, pointer, skb, txdes, ctl_stat);

> >         txdes->txdes0 = cpu_to_le32(ctl_stat & priv->txdes0_edotr_mask);

> > 

> > +       /* Ensure the descriptor config is visible before setting the tx

> > +        * pointer.

> > +        */

> > +       smp_wmb();

> > +

> >         priv->tx_clean_pointer = ftgmac100_next_tx_pointer(priv, pointer);

> > 

> >         return true;

> > @@ -806,6 +811,11 @@ static netdev_tx_t ftgmac100_hard_start_xmit(struct sk_buff *skb,

> >         dma_wmb();

> >         first->txdes0 = cpu_to_le32(f_ctl_stat);

> > 

> > +       /* Ensure the descriptor config is visible before setting the tx

> > +        * pointer.

> > +        */

> > +       smp_wmb();

> > +

> 

> Shouldn't these be paired with smp_rmb() on the reader side?


(Not near the code right now) I *think* the reader already has them
where it matters but I might have overlooked something when I quickly
checked the other day.

Cheers,
Ben.
Joel Stanley Oct. 28, 2020, 4:47 a.m. | #8
On Thu, 22 Oct 2020 at 07:41, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> On Wed, 2020-10-21 at 14:40 +0200, Arnd Bergmann wrote:
> > On Wed, Oct 21, 2020 at 12:39 PM Joel Stanley <joel@jms.id.au> wrote:
> >
> > >
> > > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
> > > index 331d4bdd4a67..15cdfeb135b0 100644
> > > --- a/drivers/net/ethernet/faraday/ftgmac100.c
> > > +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> > > @@ -653,6 +653,11 @@ static bool ftgmac100_tx_complete_packet(struct ftgmac100 *priv)
> > >         ftgmac100_free_tx_packet(priv, pointer, skb, txdes, ctl_stat);
> > >         txdes->txdes0 = cpu_to_le32(ctl_stat & priv->txdes0_edotr_mask);
> > >
> > > +       /* Ensure the descriptor config is visible before setting the tx
> > > +        * pointer.
> > > +        */
> > > +       smp_wmb();
> > > +
> > >         priv->tx_clean_pointer = ftgmac100_next_tx_pointer(priv, pointer);
> > >
> > >         return true;
> > > @@ -806,6 +811,11 @@ static netdev_tx_t ftgmac100_hard_start_xmit(struct sk_buff *skb,
> > >         dma_wmb();
> > >         first->txdes0 = cpu_to_le32(f_ctl_stat);
> > >
> > > +       /* Ensure the descriptor config is visible before setting the tx
> > > +        * pointer.
> > > +        */
> > > +       smp_wmb();
> > > +
> >
> > Shouldn't these be paired with smp_rmb() on the reader side?
>
> (Not near the code right now) I *think* the reader already has them
> where it matters but I might have overlooked something when I quickly
> checked the other day.

Do we need a read barrier at the start of ftgmac100_tx_complete_packet?

        pointer = priv->tx_clean_pointer;
<--- here
        txdes = &priv->txdes[pointer];

        ctl_stat = le32_to_cpu(txdes->txdes0);
        if (ctl_stat & FTGMAC100_TXDES0_TXDMA_OWN)
                return false;

This was the only spot I could see that might require one.

Cheers,

Joel
Arnd Bergmann Oct. 28, 2020, 10:51 a.m. | #9
On Wed, Oct 28, 2020 at 5:47 AM Joel Stanley <joel@jms.id.au> wrote:
>
> On Thu, 22 Oct 2020 at 07:41, Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> > > >
> > > > +       /* Ensure the descriptor config is visible before setting the tx
> > > > +        * pointer.
> > > > +        */
> > > > +       smp_wmb();
> > > > +
> > >
> > > Shouldn't these be paired with smp_rmb() on the reader side?
> >
> > (Not near the code right now) I *think* the reader already has them
> > where it matters but I might have overlooked something when I quickly
> > checked the other day.
>
> Do we need a read barrier at the start of ftgmac100_tx_complete_packet?
>
>         pointer = priv->tx_clean_pointer;
> <--- here
>         txdes = &priv->txdes[pointer];
>
>         ctl_stat = le32_to_cpu(txdes->txdes0);
>         if (ctl_stat & FTGMAC100_TXDES0_TXDMA_OWN)
>                 return false;
>
> This was the only spot I could see that might require one.

No, I don't think this is the one, since tx_clean_pointer is not updated
in the other CPU, only this one. From what I can tell, you have
a communication between three concurrent threads in the TX
path:

a) one CPU runs start_xmit. It reads tx_clean_pointer
    from the second CPU, writes the descriptors for the hardware,
    notifies the hardware and updates tx_pointer.

b) the hardware gets kicked by start_xmit, it reads the
     descriptors, reads the data and updates the descriptors.
     it may send an interrupt, which is not important here.

c) a second CPU runs the poll() function. It reads the
    tx_pointer, reads the descriptors, updates the descriptors
    and updates tx_clean_pointer.

Things get a bit confusing because the tx_pointer and
tx_clean_pointer variables are hidden behind macros
and both sides repeatedly read and write them, with no
serialization.

This is what I would try to untangle it:

- mark both tx_pointer and tx_clean_pointer as
  ____cacheline_aligned_in_smp to avoid the cacheline
 pingpong between the two CPUs

- Use smp_load_acquire() to read a pointer that may have
  been updated by the other thread, and smp_store_release()
  to update the one the other side will read.

- pull these accesses into the callers and only do them
  once if possible

- reconsider the "keep poll() going as long as tx
  packets are queued" logic, and instead serialize
  against the packets that are actually completed
  by the hardware.

        Arnd

Patch

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
index 331d4bdd4a67..15cdfeb135b0 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -653,6 +653,11 @@  static bool ftgmac100_tx_complete_packet(struct ftgmac100 *priv)
 	ftgmac100_free_tx_packet(priv, pointer, skb, txdes, ctl_stat);
 	txdes->txdes0 = cpu_to_le32(ctl_stat & priv->txdes0_edotr_mask);
 
+	/* Ensure the descriptor config is visible before setting the tx
+	 * pointer.
+	 */
+	smp_wmb();
+
 	priv->tx_clean_pointer = ftgmac100_next_tx_pointer(priv, pointer);
 
 	return true;
@@ -806,6 +811,11 @@  static netdev_tx_t ftgmac100_hard_start_xmit(struct sk_buff *skb,
 	dma_wmb();
 	first->txdes0 = cpu_to_le32(f_ctl_stat);
 
+	/* Ensure the descriptor config is visible before setting the tx
+	 * pointer.
+	 */
+	smp_wmb();
+
 	/* Update next TX pointer */
 	priv->tx_pointer = pointer;