diff mbox series

[5/5] net: ch9200: avoid triggering NWay restart on non-zero PHY ID

Message ID 20250412183829.41342-6-qasdev00@gmail.com
State New
Headers show
Series net: ch9200: fix various bugs and improve qinheng ch9200 driver | expand

Commit Message

Qasim Ijaz April 12, 2025, 6:38 p.m. UTC
During ch9200_mdio_read if the phy_id is not 0 -ENODEV is returned.

In certain cases such as in mii_nway_restart returning a negative such
as -ENODEV triggers the "bmcr & BMCR_ANENABLE" check, we should avoid 
this on error and just end the function.

To address this just return 0.

Signed-off-by: Qasim Ijaz <qasdev00@gmail.com> 
---
 drivers/net/usb/ch9200.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Qasim Ijaz April 17, 2025, 1:12 p.m. UTC | #1
On Tue, Apr 15, 2025 at 08:56:48PM -0700, Jakub Kicinski wrote:
> On Tue, 15 Apr 2025 20:52:30 -0700 Jakub Kicinski wrote:
> > On Tue, 15 Apr 2025 03:35:07 +0200 Andrew Lunn wrote:
> > > > @@ -182,7 +182,7 @@ static int ch9200_mdio_read(struct net_device *netdev, int phy_id, int loc)
> > > >  		   __func__, phy_id, loc);
> > > >  
> > > >  	if (phy_id != 0)
> > > > -		return -ENODEV;
> > > > +		return 0;    
> > > 
> > > An actually MDIO bus would return 0xffff is asked to read from a PHY
> > > which is not on the bus. But i've no idea how the ancient mii code
> > > handles this.
> > > 
> > > If this code every gets updated to using phylib, many of the changes
> > > you are making will need reverting because phylib actually wants to
> > > see the errors. So i'm somewhat reluctant to make changes like this.  
> > 
> > Right.
> > 
> > I mean most of the patches seem to be adding error checking, unlike
> > this one, but since Qasim doesn't have access to this HW they are
> > more likely to break stuff than fix. I'm going to apply the first
> > patch, Qasim if you'd like to clean up the rest I think it should
> > be done separately without the Fixes tags, if at all.
> 
> Ah, no, patch 1 also does return 0. Hm. Maybe let's propagate the real
> error to silence the syzbot error and if someone with access to the HW

Hi Andrew and Jakub

Since there is uncertainty on whether these patches would break things 
how about I refactor the patches to instead return what the function 
already returns, this way we include error handling but maintain consistency 
with what the function already returns and does so there is no chance of 
breaking stuff. I think including the error handling would be a good idea
overall because we have already seen 1 bug where the root cause is insufficient 
error handling right? Furthermore this driver has not been updated in 4 years, 
so for the near‑term surely improving these aspects can only be a good thing.

So now going into the changes:

Patch 1: So patch 1 changes mdio_read, we can see that on failure mdio_read 
already returns a negative of -ENODEV, so for the patch 1 change we can simply 
just error check control_read by "if (ret < 0) return ret;" This matches 
the fact that mdio_read already returns a negative so no chance of breaking anything.

Patch 2: For patch 2 I will add Cc stable to this patch since kernel test robot 
flagged it, I assume backporting it would be the right thing to do since the 
return statement here stops error propagation. Would you like me to add it to the other patches?

Patch 3: For patch 3 the get_mac_address and ch9200_bind function already returns a
negative on error so my changes don't change what is returned, so this should be fine i think.

Patch 4: For patch 4 it already returns a negative on error via usbnet_get_endpoints() 
so i hope it is fine as is? Jakub commented on the changed usbnet_get_endpoints() 
error check, if you want me to revert it back I can do that.

Patch 5: We can drop this.

Andrew and Jakub if you’re happy with this should I resend with these changes?

> comes along they can try to move this driver to more modern infra?
Andrew Lunn April 17, 2025, 2:08 p.m. UTC | #2
On Thu, Apr 17, 2025 at 02:12:36PM +0100, Qasim Ijaz wrote:
> On Tue, Apr 15, 2025 at 08:56:48PM -0700, Jakub Kicinski wrote:
> > On Tue, 15 Apr 2025 20:52:30 -0700 Jakub Kicinski wrote:
> > > On Tue, 15 Apr 2025 03:35:07 +0200 Andrew Lunn wrote:
> > > > > @@ -182,7 +182,7 @@ static int ch9200_mdio_read(struct net_device *netdev, int phy_id, int loc)
> > > > >  		   __func__, phy_id, loc);
> > > > >  
> > > > >  	if (phy_id != 0)
> > > > > -		return -ENODEV;
> > > > > +		return 0;    
> > > > 
> > > > An actually MDIO bus would return 0xffff is asked to read from a PHY
> > > > which is not on the bus. But i've no idea how the ancient mii code
> > > > handles this.
> > > > 
> > > > If this code every gets updated to using phylib, many of the changes
> > > > you are making will need reverting because phylib actually wants to
> > > > see the errors. So i'm somewhat reluctant to make changes like this.  
> > > 
> > > Right.
> > > 
> > > I mean most of the patches seem to be adding error checking, unlike
> > > this one, but since Qasim doesn't have access to this HW they are
> > > more likely to break stuff than fix. I'm going to apply the first
> > > patch, Qasim if you'd like to clean up the rest I think it should
> > > be done separately without the Fixes tags, if at all.
> > 
> > Ah, no, patch 1 also does return 0. Hm. Maybe let's propagate the real
> > error to silence the syzbot error and if someone with access to the HW
> 
> Hi Andrew and Jakub
> 
> Since there is uncertainty on whether these patches would break things 
> how about I refactor the patches to instead return what the function 
> already returns, this way we include error handling but maintain consistency 
> with what the function already returns and does so there is no chance of 
> breaking stuff. I think including the error handling would be a good idea
> overall because we have already seen 1 bug where the root cause is insufficient 
> error handling right? Furthermore this driver has not been updated in 4 years, 
> so for the near‑term surely improving these aspects can only be a good thing.

It is not a simple thing to decided if we should make changes or not,
if we don't have the hardware. The test robot is saying things are
potentially wrong, but we don't have any users complaining it is
broken. If we make the test robot happy, without testing the changes,
we can make users unhappy by breaking it. And that is the opposite of
what we want.

We also need to think about "return on investment". Is anybody
actually using this device still? Would it be better to spend our time
on other devices we know are actually used?

If you can find a board which actually has this device, or can find
somebody to run tests, then great, we are likely to accept them. But
otherwise please focus on minimum low risk changes which are obviously
correct, or just leave the test robot unhappy.

	Andrew
Qasim Ijaz April 17, 2025, 5:27 p.m. UTC | #3
On Thu, Apr 17, 2025 at 04:08:08PM +0200, Andrew Lunn wrote:
> On Thu, Apr 17, 2025 at 02:12:36PM +0100, Qasim Ijaz wrote:
> > On Tue, Apr 15, 2025 at 08:56:48PM -0700, Jakub Kicinski wrote:
> > > On Tue, 15 Apr 2025 20:52:30 -0700 Jakub Kicinski wrote:
> > > > On Tue, 15 Apr 2025 03:35:07 +0200 Andrew Lunn wrote:
> > > > > > @@ -182,7 +182,7 @@ static int ch9200_mdio_read(struct net_device *netdev, int phy_id, int loc)
> > > > > >  		   __func__, phy_id, loc);
> > > > > >  
> > > > > >  	if (phy_id != 0)
> > > > > > -		return -ENODEV;
> > > > > > +		return 0;    
> > > > > 
> > > > > An actually MDIO bus would return 0xffff is asked to read from a PHY
> > > > > which is not on the bus. But i've no idea how the ancient mii code
> > > > > handles this.
> > > > > 
> > > > > If this code every gets updated to using phylib, many of the changes
> > > > > you are making will need reverting because phylib actually wants to
> > > > > see the errors. So i'm somewhat reluctant to make changes like this.  
> > > > 
> > > > Right.
> > > > 
> > > > I mean most of the patches seem to be adding error checking, unlike
> > > > this one, but since Qasim doesn't have access to this HW they are
> > > > more likely to break stuff than fix. I'm going to apply the first
> > > > patch, Qasim if you'd like to clean up the rest I think it should
> > > > be done separately without the Fixes tags, if at all.
> > > 
> > > Ah, no, patch 1 also does return 0. Hm. Maybe let's propagate the real
> > > error to silence the syzbot error and if someone with access to the HW
> > 
> > Hi Andrew and Jakub
> > 
> > Since there is uncertainty on whether these patches would break things 
> > how about I refactor the patches to instead return what the function 
> > already returns, this way we include error handling but maintain consistency 
> > with what the function already returns and does so there is no chance of 
> > breaking stuff. I think including the error handling would be a good idea
> > overall because we have already seen 1 bug where the root cause is insufficient 
> > error handling right? Furthermore this driver has not been updated in 4 years, 
> > so for the near‑term surely improving these aspects can only be a good thing.
> 
> It is not a simple thing to decided if we should make changes or not,
> if we don't have the hardware. The test robot is saying things are
> potentially wrong, but we don't have any users complaining it is
> broken. If we make the test robot happy, without testing the changes,
> we can make users unhappy by breaking it. And that is the opposite of
> what we want.

For patch 2 the kernel test robot said it is missing a Cc stable tag in
the sign-off area, it didnt highlight any build or functional errors so
I don't understand what you mean there.

> 
> We also need to think about "return on investment". Is anybody
> actually using this device still? Would it be better to spend our time
> on other devices we know are actually used?
> 
> If you can find a board which actually has this device, or can find
> somebody to run tests, then great, we are likely to accept them. But
> otherwise please focus on minimum low risk changes which are obviously

So going forward what should we do? I gave my thoughts for each
patch above and how I think we should change it to minimise
breaking things while adding error handling, which ones do you 
agree/ don't agree with?

Thanks
Qasim
> correct, or just leave the test robot unhappy.
> 
> 	Andrew
Qasim Ijaz April 25, 2025, 10:13 a.m. UTC | #4
Hi Andrew, Jakub

Just pinging on my last message. Any thoughts on how to proceed with this patch series, I left my thoughts in the previous message.

Thanks,
Qasim
Andrew Lunn April 28, 2025, 2:22 p.m. UTC | #5
On Fri, Apr 25, 2025 at 11:13:12AM +0100, Qasim Ijaz wrote:
> Hi Andrew, Jakub
> 
> Just pinging on my last message. Any thoughts on how to proceed with
> this patch series, I left my thoughts in the previous message.

I would suggest you do the minimum, low risk changes. Don't be driven
to fix all the syzbot warnings just to make syzbot quiet. What really
matters is you don't break the driver for users. syzbot is secondary.

	Andrew
Qasim Ijaz April 30, 2025, 5:57 p.m. UTC | #6
On Mon, Apr 28, 2025 at 04:22:59PM +0200, Andrew Lunn wrote:
> On Fri, Apr 25, 2025 at 11:13:12AM +0100, Qasim Ijaz wrote:
> > Hi Andrew, Jakub
> > 
> > Just pinging on my last message. Any thoughts on how to proceed with
> > this patch series, I left my thoughts in the previous message.
> 
> I would suggest you do the minimum, low risk changes. Don't be driven
> to fix all the syzbot warnings just to make syzbot quiet. What really
> matters is you don't break the driver for users. syzbot is secondary.
> 

Right, got it so avoid breaking it at all costs, in that case should we move
forward with the syzbot fix and the "remove extraneous return that
prevents error propagation" patches only? 

For the syzbot one we will return a negative on control read failure,
as the function already does that when encountering an invalid phy_id.

As for the "remove extraneous return that prevents error 
propagation" change it seems like a simple low risk change 
from what I can tell (if not please let me know).

Would you guys be happy with this?

Thanks
Qasim
> 	Andrew
diff mbox series

Patch

diff --git a/drivers/net/usb/ch9200.c b/drivers/net/usb/ch9200.c
index 187bbfc991f5..281800bb2ff2 100644
--- a/drivers/net/usb/ch9200.c
+++ b/drivers/net/usb/ch9200.c
@@ -182,7 +182,7 @@  static int ch9200_mdio_read(struct net_device *netdev, int phy_id, int loc)
 		   __func__, phy_id, loc);
 
 	if (phy_id != 0)
-		return -ENODEV;
+		return 0;
 
 	ret = control_read(dev, REQUEST_READ, 0, loc * 2, buff, 0x02,
 			   CONTROL_TIMEOUT_MS);