mbox series

[00/14] Multiple improvement to qca8k stability

Message ID 20210423014741.11858-1-ansuelsmth@gmail.com
Headers show
Series Multiple improvement to qca8k stability | expand

Message

Christian Marangi April 23, 2021, 1:47 a.m. UTC
Currently qca8337 switch are widely used on ipq8064 based router.
On these particular router it was notice a very unstable switch with
port not link detected as link with unknown speed, port dropping
randomly and general unreliability. Lots of testing and comparison
between this dsa driver and the original qsdk driver showed lack of some
additional delay and values. A main difference arised from the original
driver and the dsa one. The original driver didn't use MASTER regs to
read phy status and the dedicated mdio driver worked correctly. Now that
the dsa driver actually use these regs, it was found that these special
read/write operation required mutual exclusion to normal
qca8k_read/write operation. The add of mutex for these operation fixed
the random port dropping and now only the actual linked port randomly
dropped. Adding additional delay for set_page operation and fixing a bug
in the mdio dedicated driver fixed also this problem. The current driver
requires also more time to apply vlan switch. All of these changes and
tweak permit a now very stable and reliable dsa driver and 0 port
dropping. This series is currently tested by at least 5 user with
different routers and all reports positive results and no problems.

Ansuel Smith (14):
  drivers: net: dsa: qca8k: handle error with set_page
  drivers: net: dsa: qca8k: tweak internal delay to oem spec
  drivers: net: mdio: mdio-ip8064: improve busy wait delay
  drivers: net: dsa: qca8k: apply suggested packet priority
  drivers: net: dsa: qca8k: add support for qca8327 switch
  devicetree: net: dsa: qca8k: Document new compatible qca8327
  drivers: net: dsa: qca8k: limit priority tweak to qca8337 switch
  drivers: net: dsa: qca8k: add GLOBAL_FC settings needed for qca8327
  drivers: net: dsa: qca8k: add support for switch rev
  drivers: net: dsa: qca8k: add support for specific QCA access function
  drivers: net: dsa: qca8k: apply switch revision fix
  drivers: net: dsa: qca8k: clear MASTER_EN after phy read/write
  drivers: net: dsa: qca8k: protect MASTER busy_wait with mdio mutex
  drivers: net: dsa: qca8k: enlarge mdio delay and timeout

 .../devicetree/bindings/net/dsa/qca8k.txt     |   1 +
 drivers/net/dsa/qca8k.c                       | 256 ++++++++++++++++--
 drivers/net/dsa/qca8k.h                       |  54 +++-
 drivers/net/mdio/mdio-ipq8064.c               |  36 ++-
 4 files changed, 304 insertions(+), 43 deletions(-)

Comments

Florian Fainelli April 23, 2021, 1:52 a.m. UTC | #1
On 4/22/2021 6:47 PM, Ansuel Smith wrote:
> Better handle function qca8k_set_page. The original code requires a
> deleay of 5us and set the current page only if the bus write has not
> failed.
> 
> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
> ---
>  drivers/net/dsa/qca8k.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
> index cdaf9f85a2cb..a6d35b825c0e 100644
> --- a/drivers/net/dsa/qca8k.c
> +++ b/drivers/net/dsa/qca8k.c
> @@ -133,9 +133,12 @@ qca8k_set_page(struct mii_bus *bus, u16 page)
>  	if (page == qca8k_current_page)
>  		return;
>  
> -	if (bus->write(bus, 0x18, 0, page) < 0)
> +	if (bus->write(bus, 0x18, 0, page)) {
>  		dev_err_ratelimited(&bus->dev,
>  				    "failed to set qca8k page\n");
> +		return;
> +	}

An improvement would be to propagate the return value to the two callers
which themselves do allow an error to be propagated no? If you cannot
set the page you are pretty much toast.
Christian Marangi April 23, 2021, 2:03 a.m. UTC | #2
On Thu, Apr 22, 2021 at 06:56:34PM -0700, Florian Fainelli wrote:
> 
> 
> On 4/22/2021 6:47 PM, Ansuel Smith wrote:
> > With the use of the qca8k dsa driver, some problem arised related to
> > port status detection. With a load on a specific port (for example a
> > simple speed test), the driver starts to bheave in a strange way and
> 
> s/bheave/behave/
> 
> > garbage data is produced. To address this, enlarge the sleep delay and
> > address a bug for the reg offset 31 that require additional delay for
> > this specific reg.
> > 
> > Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
> > ---
> >  drivers/net/mdio/mdio-ipq8064.c | 36 ++++++++++++++++++++-------------
> >  1 file changed, 22 insertions(+), 14 deletions(-)
> > 
> > diff --git a/drivers/net/mdio/mdio-ipq8064.c b/drivers/net/mdio/mdio-ipq8064.c
> > index 1bd18857e1c5..5bd6d0501642 100644
> > --- a/drivers/net/mdio/mdio-ipq8064.c
> > +++ b/drivers/net/mdio/mdio-ipq8064.c
> > @@ -15,25 +15,26 @@
> >  #include <linux/mfd/syscon.h>
> >  
> >  /* MII address register definitions */
> > -#define MII_ADDR_REG_ADDR                       0x10
> > -#define MII_BUSY                                BIT(0)
> > -#define MII_WRITE                               BIT(1)
> > -#define MII_CLKRANGE_60_100M                    (0 << 2)
> > -#define MII_CLKRANGE_100_150M                   (1 << 2)
> > -#define MII_CLKRANGE_20_35M                     (2 << 2)
> > -#define MII_CLKRANGE_35_60M                     (3 << 2)
> > -#define MII_CLKRANGE_150_250M                   (4 << 2)
> > -#define MII_CLKRANGE_250_300M                   (5 << 2)
> > +#define MII_ADDR_REG_ADDR			0x10
> > +#define MII_BUSY				BIT(0)
> > +#define MII_WRITE				BIT(1)
> > +#define MII_CLKRANGE(x)				((x) << 2)
> > +#define MII_CLKRANGE_60_100M			MII_CLKRANGE(0)
> > +#define MII_CLKRANGE_100_150M			MII_CLKRANGE(1)
> > +#define MII_CLKRANGE_20_35M			MII_CLKRANGE(2)
> > +#define MII_CLKRANGE_35_60M			MII_CLKRANGE(3)
> > +#define MII_CLKRANGE_150_250M			MII_CLKRANGE(4)
> > +#define MII_CLKRANGE_250_300M			MII_CLKRANGE(5)
> >  #define MII_CLKRANGE_MASK			GENMASK(4, 2)
> >  #define MII_REG_SHIFT				6
> >  #define MII_REG_MASK				GENMASK(10, 6)
> >  #define MII_ADDR_SHIFT				11
> >  #define MII_ADDR_MASK				GENMASK(15, 11)
> >  
> > -#define MII_DATA_REG_ADDR                       0x14
> > +#define MII_DATA_REG_ADDR			0x14
> >  
> > -#define MII_MDIO_DELAY_USEC                     (1000)
> > -#define MII_MDIO_RETRY_MSEC                     (10)
> > +#define MII_MDIO_DELAY_USEC			(1000)
> > +#define MII_MDIO_RETRY_MSEC			(10)
> 
> These changes are not related to what you are doing and are just
> whitespace cleaning, better not to mix them with functional changes.
>

Ok will send them in a different patch.

> >  
> >  struct ipq8064_mdio {
> >  	struct regmap *base; /* NSS_GMAC0_BASE */
> > @@ -65,7 +66,7 @@ ipq8064_mdio_read(struct mii_bus *bus, int phy_addr, int reg_offset)
> >  		   ((reg_offset << MII_REG_SHIFT) & MII_REG_MASK);
> >  
> >  	regmap_write(priv->base, MII_ADDR_REG_ADDR, miiaddr);
> > -	usleep_range(8, 10);
> > +	usleep_range(10, 13);
> >  
> >  	err = ipq8064_mdio_wait_busy(priv);
> >  	if (err)
> > @@ -91,7 +92,14 @@ ipq8064_mdio_write(struct mii_bus *bus, int phy_addr, int reg_offset, u16 data)
> >  		   ((reg_offset << MII_REG_SHIFT) & MII_REG_MASK);
> >  
> >  	regmap_write(priv->base, MII_ADDR_REG_ADDR, miiaddr);
> > -	usleep_range(8, 10);
> > +
> > +	/* For the specific reg 31 extra time is needed or the next
> > +	 * read will produce grabage data.
> 
> s/grabage/garbage/
> 
> > +	 */
> > +	if (reg_offset == 31)
> > +		usleep_range(30, 43);
> > +	else
> > +		usleep_range(10, 13);
> 
> This is just super weird, presumably register 31 needs to be conditional
> to the PHY, or pseudo-PHY being driven here. Not that it would harm, but
> waiting an extra 30 to 43 microseconds with a Marvell PHY or Broadcom
> PHY or from another vendor would not be necessary.
>

Any idea how to check this? I found this by printing every value wrote
and read to the mdio driver and notice this. With only this reg. By
adding extra delay the problem is solved, without this the very next
read produce bad data. Maybe some type of specific binding can be useful
here? Some type of 'qcom,extra-delay-31' binding? (fell free to suggest
a better name since i'm very bad at them)

> >  
> >  	return ipq8064_mdio_wait_busy(priv);
> >  }
> > 
> 
> -- 
> Florian
Andrew Lunn April 23, 2021, 12:42 p.m. UTC | #3
> @@ -1467,11 +1468,16 @@ qca8k_sw_probe(struct mdio_device *mdiodev)
>  		gpiod_set_value_cansleep(priv->reset_gpio, 0);
>  	}
>  
> +	/* get the switches ID from the compatible */
> +	data = of_device_get_match_data(&mdiodev->dev);
> +	if (!data)
> +		return -ENODEV;
> +
>  	/* read the switches ID register */
>  	id = qca8k_read(priv, QCA8K_REG_MASK_CTRL);
>  	id >>= QCA8K_MASK_CTRL_ID_S;
>  	id &= QCA8K_MASK_CTRL_ID_M;
> -	if (id != QCA8K_ID_QCA8337)
> +	if (id != data->id)
>  		return -ENODEV;

It is useful to print an error message here: Found X, expected
Y. Gives the DT writer an idea what they did wrong.

   Andrew