mbox series

[net,0/7] Bugfixes in Microsemi Ocelot switch driver

Message ID 20200915182229.69529-1-olteanv@gmail.com
Headers show
Series Bugfixes in Microsemi Ocelot switch driver | expand

Message

Vladimir Oltean Sept. 15, 2020, 6:22 p.m. UTC
From: Vladimir Oltean <vladimir.oltean@nxp.com>

This is a series of 7 assorted patches for "net", on the drivers for the
VSC7514 MIPS switch (Ocelot-1), the VSC9953 PowerPC (Seville), and a few
more that are common to all supported devices since they are in the
common library portion.

Vladimir Oltean (7):
  net: mscc: ocelot: fix race condition with TX timestamping
  net: mscc: ocelot: add locking for the port TX timestamp ID
  net: dsa: seville: fix buffer size of the queue system
  net: mscc: ocelot: check for errors on memory allocation of ports
  net: mscc: ocelot: error checking when calling ocelot_init()
  net: mscc: ocelot: refactor ports parsing code into a dedicated
    function
  net: mscc: ocelot: unregister net devices on unbind

 drivers/net/dsa/ocelot/felix.c             |   5 +-
 drivers/net/dsa/ocelot/seville_vsc9953.c   |   2 +-
 drivers/net/ethernet/mscc/ocelot.c         |  13 +-
 drivers/net/ethernet/mscc/ocelot_net.c     |  12 +-
 drivers/net/ethernet/mscc/ocelot_vsc7514.c | 234 ++++++++++++---------
 include/soc/mscc/ocelot.h                  |   1 +
 net/dsa/tag_ocelot.c                       |  11 +-
 7 files changed, 168 insertions(+), 110 deletions(-)

Comments

Horatiu Vultur Sept. 15, 2020, 9:19 p.m. UTC | #1
The 09/15/2020 21:22, Vladimir Oltean wrote:
> 

> From: Vladimir Oltean <vladimir.oltean@nxp.com>

> 

> This is a series of 7 assorted patches for "net", on the drivers for the

> VSC7514 MIPS switch (Ocelot-1), the VSC9953 PowerPC (Seville), and a few

> more that are common to all supported devices since they are in the

> common library portion.


I have looked over ocelot changes and they look fine to me.

Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>


> 

> Vladimir Oltean (7):

>   net: mscc: ocelot: fix race condition with TX timestamping

>   net: mscc: ocelot: add locking for the port TX timestamp ID

>   net: dsa: seville: fix buffer size of the queue system

>   net: mscc: ocelot: check for errors on memory allocation of ports

>   net: mscc: ocelot: error checking when calling ocelot_init()

>   net: mscc: ocelot: refactor ports parsing code into a dedicated

>     function

>   net: mscc: ocelot: unregister net devices on unbind

> 

>  drivers/net/dsa/ocelot/felix.c             |   5 +-

>  drivers/net/dsa/ocelot/seville_vsc9953.c   |   2 +-

>  drivers/net/ethernet/mscc/ocelot.c         |  13 +-

>  drivers/net/ethernet/mscc/ocelot_net.c     |  12 +-

>  drivers/net/ethernet/mscc/ocelot_vsc7514.c | 234 ++++++++++++---------

>  include/soc/mscc/ocelot.h                  |   1 +

>  net/dsa/tag_ocelot.c                       |  11 +-

>  7 files changed, 168 insertions(+), 110 deletions(-)

> 

> --

> 2.25.1

> 


-- 
/Horatiu
Alexandre Belloni Sept. 16, 2020, 9:56 a.m. UTC | #2
Hi,

On 15/09/2020 21:22:22+0300, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> This is a series of 7 assorted patches for "net", on the drivers for the
> VSC7514 MIPS switch (Ocelot-1), the VSC9953 PowerPC (Seville), and a few
> more that are common to all supported devices since they are in the
> common library portion.
> 
> Vladimir Oltean (7):
>   net: mscc: ocelot: fix race condition with TX timestamping
>   net: mscc: ocelot: add locking for the port TX timestamp ID
>   net: dsa: seville: fix buffer size of the queue system
>   net: mscc: ocelot: check for errors on memory allocation of ports
>   net: mscc: ocelot: error checking when calling ocelot_init()
>   net: mscc: ocelot: refactor ports parsing code into a dedicated
>     function
>   net: mscc: ocelot: unregister net devices on unbind
> 
>  drivers/net/dsa/ocelot/felix.c             |   5 +-
>  drivers/net/dsa/ocelot/seville_vsc9953.c   |   2 +-
>  drivers/net/ethernet/mscc/ocelot.c         |  13 +-
>  drivers/net/ethernet/mscc/ocelot_net.c     |  12 +-
>  drivers/net/ethernet/mscc/ocelot_vsc7514.c | 234 ++++++++++++---------
>  include/soc/mscc/ocelot.h                  |   1 +
>  net/dsa/tag_ocelot.c                       |  11 +-
>  7 files changed, 168 insertions(+), 110 deletions(-)
> 

This series is leading to multiple crashes on my vc7524 board. I'm
trying to pinpoint the problematic commits
Alexandre Belloni Sept. 16, 2020, 11:12 a.m. UTC | #3
On 15/09/2020 21:22:24+0300, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> The ocelot_port->ts_id is used to:
> - populate skb->cb[0] for matching the TX timestamp in the PTP IRQ with
>   an skb.
> - populate the REW_OP from the injection header of the ongoing skb.
> Only then is ocelot_port->ts_id incremented.
> 
> This is a problem because, at least theoretically, another timestampable
> skb might use the same ocelot_port->ts_id before that is incremented. So
> the logic of using and incrementing the timestamp id should be atomic
> per port.
> 
> The solution is to use the global ocelot_port->ts_id only while
> protected by the associated ocelot_port->ts_id_lock. That's where we
> populate skb->cb[0]. Note that for ocelot,
> ocelot_port_add_txtstamp_skb() is called for the actual skb, but for
> felix, it is called for the skb's clone. That is something which will
> also be changed.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  drivers/net/ethernet/mscc/ocelot.c     | 13 ++++++++++++-
>  drivers/net/ethernet/mscc/ocelot_net.c |  6 ++----
>  include/soc/mscc/ocelot.h              |  1 +
>  net/dsa/tag_ocelot.c                   | 11 +++++++----
>  4 files changed, 22 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
> index 5abb7d2b0a9e..8caf3afd464d 100644
> --- a/drivers/net/ethernet/mscc/ocelot.c
> +++ b/drivers/net/ethernet/mscc/ocelot.c
> @@ -421,10 +421,15 @@ int ocelot_port_add_txtstamp_skb(struct ocelot_port *ocelot_port,
>  
>  	if (ocelot->ptp && shinfo->tx_flags & SKBTX_HW_TSTAMP &&
>  	    ocelot_port->ptp_cmd == IFH_REW_OP_TWO_STEP_PTP) {
> +		spin_lock(&ocelot_port->ts_id_lock);
> +
>  		shinfo->tx_flags |= SKBTX_IN_PROGRESS;
>  		/* Store timestamp ID in cb[0] of sk_buff */
> -		skb->cb[0] = ocelot_port->ts_id % 4;
> +		skb->cb[0] = ocelot_port->ts_id;
> +		ocelot_port->ts_id = (ocelot_port->ts_id + 1) % 4;
>  		skb_queue_tail(&ocelot_port->tx_skbs, skb);
> +
> +		spin_unlock(&ocelot_port->ts_id_lock);
>  		return 0;
>  	}
>  	return -ENODATA;
> @@ -1443,6 +1448,12 @@ int ocelot_init(struct ocelot *ocelot)
>  	if (!ocelot->stats_queue)
>  		return -ENOMEM;
>  
> +	for (port = 0; port < ocelot->num_phys_ports; port++) {
> +		struct ocelot_port *ocelot_port = ocelot->ports[port];
> +

At this point, ocelot_port is NULL because ocelot_init is called before
the port structures are allocated in mscc_ocelot_probe

> +		spin_lock_init(&ocelot_port->ts_id_lock);
> +	}
> +
>  	INIT_LIST_HEAD(&ocelot->multicast);
>  	ocelot_mact_init(ocelot);
>  	ocelot_vlan_init(ocelot);
> diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
> index cacabc23215a..8490e42e9e2d 100644
> --- a/drivers/net/ethernet/mscc/ocelot_net.c
> +++ b/drivers/net/ethernet/mscc/ocelot_net.c
> @@ -349,10 +349,8 @@ static int ocelot_port_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>  	if (ocelot->ptp && shinfo->tx_flags & SKBTX_HW_TSTAMP) {
>  		info.rew_op = ocelot_port->ptp_cmd;
> -		if (ocelot_port->ptp_cmd == IFH_REW_OP_TWO_STEP_PTP) {
> -			info.rew_op |= (ocelot_port->ts_id  % 4) << 3;
> -			ocelot_port->ts_id++;
> -		}
> +		if (ocelot_port->ptp_cmd == IFH_REW_OP_TWO_STEP_PTP)
> +			info.rew_op |= skb->cb[0] << 3;
>  	}
>  
>  	ocelot_gen_ifh(ifh, &info);
> diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
> index da369b12005f..4521dd602ddc 100644
> --- a/include/soc/mscc/ocelot.h
> +++ b/include/soc/mscc/ocelot.h
> @@ -566,6 +566,7 @@ struct ocelot_port {
>  	u8				ptp_cmd;
>  	struct sk_buff_head		tx_skbs;
>  	u8				ts_id;
> +	spinlock_t			ts_id_lock;
>  
>  	phy_interface_t			phy_mode;
>  
> diff --git a/net/dsa/tag_ocelot.c b/net/dsa/tag_ocelot.c
> index 42f327c06dca..b4fc05cafaa6 100644
> --- a/net/dsa/tag_ocelot.c
> +++ b/net/dsa/tag_ocelot.c
> @@ -160,11 +160,14 @@ static struct sk_buff *ocelot_xmit(struct sk_buff *skb,
>  	packing(injection, &qos_class, 19,  17, OCELOT_TAG_LEN, PACK, 0);
>  
>  	if (ocelot->ptp && (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)) {
> +		struct sk_buff *clone = DSA_SKB_CB(skb)->clone;
> +
>  		rew_op = ocelot_port->ptp_cmd;
> -		if (ocelot_port->ptp_cmd == IFH_REW_OP_TWO_STEP_PTP) {
> -			rew_op |= (ocelot_port->ts_id  % 4) << 3;
> -			ocelot_port->ts_id++;
> -		}
> +		/* Retrieve timestamp ID populated inside skb->cb[0] of the
> +		 * clone by ocelot_port_add_txtstamp_skb
> +		 */
> +		if (ocelot_port->ptp_cmd == IFH_REW_OP_TWO_STEP_PTP)
> +			rew_op |= clone->cb[0] << 3;
>  
>  		packing(injection, &rew_op, 125, 117, OCELOT_TAG_LEN, PACK, 0);
>  	}
> -- 
> 2.25.1
>
Vladimir Oltean Sept. 16, 2020, 12:25 p.m. UTC | #4
On Wed, Sep 16, 2020 at 01:12:04PM +0200, Alexandre Belloni wrote:
> > +	for (port = 0; port < ocelot->num_phys_ports; port++) {
> > +		struct ocelot_port *ocelot_port = ocelot->ports[port];
> > +
> 
> At this point, ocelot_port is NULL because ocelot_init is called before
> the port structures are allocated in mscc_ocelot_probe

Yikes, you're right, thanks for testing!

-Vladimir
David Miller Sept. 17, 2020, 12:19 a.m. UTC | #5
From: Vladimir Oltean <olteanv@gmail.com>

Date: Tue, 15 Sep 2020 21:22:24 +0300

> This is a problem because, at least theoretically, another timestampable

> skb might use the same ocelot_port->ts_id before that is incremented. So

> the logic of using and incrementing the timestamp id should be atomic

> per port.


Have you actually observed this race in practice?

All transmit calls are serialized by the netdev transmit spinlock.

Let's not add locking if it is not actually necessary.
Vladimir Oltean Sept. 17, 2020, 11:43 p.m. UTC | #6
On Wed, Sep 16, 2020 at 05:19:26PM -0700, David Miller wrote:
> From: Vladimir Oltean <olteanv@gmail.com>
> Date: Tue, 15 Sep 2020 21:22:24 +0300
>
> > This is a problem because, at least theoretically, another timestampable
> > skb might use the same ocelot_port->ts_id before that is incremented. So
> > the logic of using and incrementing the timestamp id should be atomic
> > per port.
>
> Have you actually observed this race in practice?
>
> All transmit calls are serialized by the netdev transmit spinlock.
>
> Let's not add locking if it is not actually necessary.

It's a bit more complicated.
This code is also used from DSA, and DSA now declares NETIF_F_LLTX.
David Miller Sept. 17, 2020, 11:57 p.m. UTC | #7
From: Vladimir Oltean <olteanv@gmail.com>

Date: Fri, 18 Sep 2020 02:43:40 +0300

> On Wed, Sep 16, 2020 at 05:19:26PM -0700, David Miller wrote:

>> From: Vladimir Oltean <olteanv@gmail.com>

>> Date: Tue, 15 Sep 2020 21:22:24 +0300

>>

>> > This is a problem because, at least theoretically, another timestampable

>> > skb might use the same ocelot_port->ts_id before that is incremented. So

>> > the logic of using and incrementing the timestamp id should be atomic

>> > per port.

>>

>> Have you actually observed this race in practice?

>>

>> All transmit calls are serialized by the netdev transmit spinlock.

>>

>> Let's not add locking if it is not actually necessary.

> 

> It's a bit more complicated.

> This code is also used from DSA, and DSA now declares NETIF_F_LLTX.


Please document that in the commit log message, thank you.