[bnx2,Regression,4.8] Driver loading fails without firmware

Message ID f055c961-58ca-ea81-46d8-610fa055cce8@molgen.mpg.de
State New
Headers show

Commit Message

Paul Menzel Oct. 27, 2016, 1:21 p.m.
Dear Baoquan,


On 10/26/16 14:00, Baoquan He wrote:

> On 10/26/16 at 12:31pm, Paul Menzel wrote:

>>>>         dev->hw_features = NETIF_F_IP_CSUM | NETIF_F_SG |

>>>> @@ -8607,6 +8608,7 @@ bnx2_init_one(struct pci_dev *pdev, const struct

>>>> pci_device_id *ent)

>>>>         return 0;

>>>>

>>>>  error:

>>>> +       bnx2_release_firmware(bp);

>>>>         pci_iounmap(pdev, bp->regview);

>>>>         pci_release_regions(pdev);

>>>>         pci_disable_device(pdev);

>>

>> Baoquan, could you please fix this regression. My suggestion is, that you

>> add the old code back, but check if the firmware has been loaded. If it

>> hasn’t, load it again.

>>

>> That way, people can update their Linux kernel, and it continues working

>> without changing the initramfs, or anything else.

>

> I saw your mail but I am also not familiar with bnx2 driver. As the

> commit log says I just tried to make bnx2 driver reset itself earlier.

>

> So you did a git bisect and found this commit caused the regression,

> right? If yes, and network developers have no action, I will look into

> the code and see if I have idea to fix it.


Well, I looked through the commits and found that one, which would 
explain the changed behavior.

To be sure, and to follow your request, I took Linux 4.8.4 and reverted 
your commit (attached). Then I deleted the firmware again from the 
initramfs, and rebooted. The devices showed up just fine as before.

So to summarize, the commit is indeed the culprit.

Thank you for looking into this.


Kind regards,

Paul

Comments

Rasesh Mody Oct. 27, 2016, 6:16 p.m. | #1
> From: dept_hsg_linux_nic_dev-bounces@qlclistserver.qlogic.com

> [mailto:dept_hsg_linux_nic_dev-bounces@qlclistserver.qlogic.com] On

> Behalf Of Paul Menzel

> 

> Dear Baoquan,

> 

> 

> On 10/26/16 14:00, Baoquan He wrote:

> 

> > On 10/26/16 at 12:31pm, Paul Menzel wrote:

> >>>>         dev->hw_features = NETIF_F_IP_CSUM | NETIF_F_SG | @@

> >>>> -8607,6 +8608,7 @@ bnx2_init_one(struct pci_dev *pdev, const struct

> >>>> pci_device_id *ent)

> >>>>         return 0;

> >>>>

> >>>>  error:

> >>>> +       bnx2_release_firmware(bp);

> >>>>         pci_iounmap(pdev, bp->regview);

> >>>>         pci_release_regions(pdev);

> >>>>         pci_disable_device(pdev);

> >>

> >> Baoquan, could you please fix this regression. My suggestion is, that

> >> you add the old code back, but check if the firmware has been loaded.

> >> If it hasn’t, load it again.

> >>

> >> That way, people can update their Linux kernel, and it continues

> >> working without changing the initramfs, or anything else.

> >

> > I saw your mail but I am also not familiar with bnx2 driver. As the

> > commit log says I just tried to make bnx2 driver reset itself earlier.

> >

> > So you did a git bisect and found this commit caused the regression,

> > right? If yes, and network developers have no action, I will look into

> > the code and see if I have idea to fix it.

> 

> Well, I looked through the commits and found that one, which would explain

> the changed behavior.

> 

> To be sure, and to follow your request, I took Linux 4.8.4 and reverted your

> commit (attached). Then I deleted the firmware again from the initramfs,

> and rebooted. The devices showed up just fine as before.


Thanks Paul!
Acked-by: Rasesh Mody <Rasesh.Mody@cavium.com>


> So to summarize, the commit is indeed the culprit.

> 

> Thank you for looking into this.

> 

> 

> Kind regards,

> 

> Paul
Baoquan He Oct. 29, 2016, 2:55 a.m. | #2
On 10/27/16 at 03:21pm, Paul Menzel wrote:
> Dear Baoquan,

> > > Baoquan, could you please fix this regression. My suggestion is, that you

> > > add the old code back, but check if the firmware has been loaded. If it

> > > hasn’t, load it again.

> > > 

> > > That way, people can update their Linux kernel, and it continues working

> > > without changing the initramfs, or anything else.

> > 

> > I saw your mail but I am also not familiar with bnx2 driver. As the

> > commit log says I just tried to make bnx2 driver reset itself earlier.

> > 

> > So you did a git bisect and found this commit caused the regression,

> > right? If yes, and network developers have no action, I will look into

> > the code and see if I have idea to fix it.

> 

> Well, I looked through the commits and found that one, which would explain

> the changed behavior.

> 

> To be sure, and to follow your request, I took Linux 4.8.4 and reverted your

> commit (attached). Then I deleted the firmware again from the initramfs, and

> rebooted. The devices showed up just fine as before.

> 

> So to summarize, the commit is indeed the culprit.


Hi Paul,

Sorry for this.

Could you tell the steps to reproduce? I will find a machine with bnx2
NIC and check if there's other ways.

Thanks
Baoquan

> From 61b8dac8796343a797858b4a2eb0a59a0cfcd735 Mon Sep 17 00:00:00 2001

> From: Paul Menzel <pmenzel@molgen.mpg.de>

> Date: Thu, 27 Oct 2016 11:34:52 +0200

> Subject: [PATCH] Revert "bnx2: Reset device during driver initialization"

> 

> This reverts commit 3e1be7ad2d38c6bd6aeef96df9bd0a7822f4e51c.

> ---

>  drivers/net/ethernet/broadcom/bnx2.c | 12 +++++-------

>  1 file changed, 5 insertions(+), 7 deletions(-)

> 

> diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c

> index 27f11a5..ecd357d 100644

> --- a/drivers/net/ethernet/broadcom/bnx2.c

> +++ b/drivers/net/ethernet/broadcom/bnx2.c

> @@ -6356,6 +6356,10 @@ bnx2_open(struct net_device *dev)

>  	struct bnx2 *bp = netdev_priv(dev);

>  	int rc;

>  

> +	rc = bnx2_request_firmware(bp);

> +	if (rc < 0)

> +		goto out;

> +

>  	netif_carrier_off(dev);

>  

>  	bnx2_disable_int(bp);

> @@ -6424,6 +6428,7 @@ bnx2_open(struct net_device *dev)

>  	bnx2_free_irq(bp);

>  	bnx2_free_mem(bp);

>  	bnx2_del_napi(bp);

> +	bnx2_release_firmware(bp);

>  	goto out;

>  }

>  

> @@ -8570,12 +8575,6 @@ bnx2_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)

>  

>  	pci_set_drvdata(pdev, dev);

>  

> -	rc = bnx2_request_firmware(bp);

> -	if (rc < 0)

> -		goto error;

> -

> -

> -	bnx2_reset_chip(bp, BNX2_DRV_MSG_CODE_RESET);

>  	memcpy(dev->dev_addr, bp->mac_addr, ETH_ALEN);

>  

>  	dev->hw_features = NETIF_F_IP_CSUM | NETIF_F_SG |

> @@ -8608,7 +8607,6 @@ bnx2_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)

>  	return 0;

>  

>  error:

> -	bnx2_release_firmware(bp);

>  	pci_iounmap(pdev, bp->regview);

>  	pci_release_regions(pdev);

>  	pci_disable_device(pdev);

> -- 

> 2.4.1

>
Paul Menzel Oct. 30, 2016, 11:05 a.m. | #3
Dear Baoquan,


Am Samstag, den 29.10.2016, 10:55 +0800 schrieb Baoquan He:
> On 10/27/16 at 03:21pm, Paul Menzel wrote:


> > > > Baoquan, could you please fix this regression. My suggestion is, that you

> > > > add the old code back, but check if the firmware has been loaded. If it

> > > > hasn’t, load it again.

> > > > 

> > > > That way, people can update their Linux kernel, and it continues working

> > > > without changing the initramfs, or anything else.

> > > 

> > > I saw your mail but I am also not familiar with bnx2 driver. As the

> > > commit log says I just tried to make bnx2 driver reset itself earlier.

> > > 

> > > So you did a git bisect and found this commit caused the regression,

> > > right? If yes, and network developers have no action, I will look into

> > > the code and see if I have idea to fix it.

> > 

> > Well, I looked through the commits and found that one, which would explain

> > the changed behavior.

> > 

> > To be sure, and to follow your request, I took Linux 4.8.4 and reverted your

> > commit (attached). Then I deleted the firmware again from the initramfs, and

> > rebooted. The devices showed up just fine as before.

> > 

> > So to summarize, the commit is indeed the culprit.


> Sorry for this.

> 

> Could you tell the steps to reproduce? I will find a machine with bnx2

> NIC and check if there's other ways.


Well, delete the bnx2 firmware files from the initramfs, and start the
system.

Did you read my proposal, to try to load the firmware twice, that means,
basically revert only the deleted lines of your commit, and add an
additional check?


Kind regards,

Paul
Baoquan He Oct. 31, 2016, 3:59 a.m. | #4
Hi Paul,

On 10/30/16 at 12:05pm, Paul Menzel wrote:
> Dear Baoquan,

> 

> 

> Am Samstag, den 29.10.2016, 10:55 +0800 schrieb Baoquan He:

> > On 10/27/16 at 03:21pm, Paul Menzel wrote:

> 

> > > > > Baoquan, could you please fix this regression. My suggestion is, that you

> > > > > add the old code back, but check if the firmware has been loaded. If it

> > > > > hasn’t, load it again.

> > > > > 

> > > > > That way, people can update their Linux kernel, and it continues working

> > > > > without changing the initramfs, or anything else.

> > > > 

> > > > I saw your mail but I am also not familiar with bnx2 driver. As the

> > > > commit log says I just tried to make bnx2 driver reset itself earlier.

> > > > 

> > > > So you did a git bisect and found this commit caused the regression,

> > > > right? If yes, and network developers have no action, I will look into

> > > > the code and see if I have idea to fix it.

> > > 

> > > Well, I looked through the commits and found that one, which would explain

> > > the changed behavior.

> > > 

> > > To be sure, and to follow your request, I took Linux 4.8.4 and reverted your

> > > commit (attached). Then I deleted the firmware again from the initramfs, and

> > > rebooted. The devices showed up just fine as before.

> > > 

> > > So to summarize, the commit is indeed the culprit.

> 

> > Sorry for this.

> > 

> > Could you tell the steps to reproduce? I will find a machine with bnx2

> > NIC and check if there's other ways.

> 

> Well, delete the bnx2 firmware files from the initramfs, and start the

> system.

> 

> Did you read my proposal, to try to load the firmware twice, that means,

> basically revert only the deleted lines of your commit, and add an

> additional check?


Thanks for your information!

I got a x86_64 system with bnx2 NIC, and clone Linus's git tree into
that system. Then building a new kernel 4.9.0-rc3+ with new initramfs.
But when I uncompressed the new initramfs, didn't find bnx2 related
firmware, no bnx2 files under lib/firmware of uncompressed initramfs
folder. While I did see them in /lib/firmware/bnx2/bnx2-xxxxx.fw. Could
you please say it more specifically how I should do to reproduce the
failure you encountered? I think your proposal looks good, just need a
test before post.

Thanks
Baoquan
Baoquan He Oct. 31, 2016, 6:38 a.m. | #5
On 10/31/16 at 11:59am, Baoquan He wrote:
> Hi Paul,

> 

> On 10/30/16 at 12:05pm, Paul Menzel wrote:

> > Dear Baoquan,

> > 

> > 

> > Am Samstag, den 29.10.2016, 10:55 +0800 schrieb Baoquan He:

> > > On 10/27/16 at 03:21pm, Paul Menzel wrote:

> > 

> > > > > > Baoquan, could you please fix this regression. My suggestion is, that you

> > > > > > add the old code back, but check if the firmware has been loaded. If it

> > > > > > hasn’t, load it again.

> > > > > > 

> > > > > > That way, people can update their Linux kernel, and it continues working

> > > > > > without changing the initramfs, or anything else.

> > > > > 

> > > > > I saw your mail but I am also not familiar with bnx2 driver. As the

> > > > > commit log says I just tried to make bnx2 driver reset itself earlier.

> > > > > 

> > > > > So you did a git bisect and found this commit caused the regression,

> > > > > right? If yes, and network developers have no action, I will look into

> > > > > the code and see if I have idea to fix it.

> > > > 

> > > > Well, I looked through the commits and found that one, which would explain

> > > > the changed behavior.

> > > > 

> > > > To be sure, and to follow your request, I took Linux 4.8.4 and reverted your

> > > > commit (attached). Then I deleted the firmware again from the initramfs, and

> > > > rebooted. The devices showed up just fine as before.

> > > > 

> > > > So to summarize, the commit is indeed the culprit.

> > 

> > > Sorry for this.

> > > 

> > > Could you tell the steps to reproduce? I will find a machine with bnx2

> > > NIC and check if there's other ways.

> > 

> > Well, delete the bnx2 firmware files from the initramfs, and start the

> > system.

> > 

> > Did you read my proposal, to try to load the firmware twice, that means,

> > basically revert only the deleted lines of your commit, and add an

> > additional check?

> 

Please ignore this one, I have reproduced it. Will post a fix after
test.
> 

> I got a x86_64 system with bnx2 NIC, and clone Linus's git tree into

> that system. Then building a new kernel 4.9.0-rc3+ with new initramfs.

> But when I uncompressed the new initramfs, didn't find bnx2 related

> firmware, no bnx2 files under lib/firmware of uncompressed initramfs

> folder. While I did see them in /lib/firmware/bnx2/bnx2-xxxxx.fw. Could

> you please say it more specifically how I should do to reproduce the

> failure you encountered? I think your proposal looks good, just need a

> test before post.

> 

> Thanks

> Baoquan

Patch hide | download patch | download mbox

From 61b8dac8796343a797858b4a2eb0a59a0cfcd735 Mon Sep 17 00:00:00 2001
From: Paul Menzel <pmenzel@molgen.mpg.de>
Date: Thu, 27 Oct 2016 11:34:52 +0200
Subject: [PATCH] Revert "bnx2: Reset device during driver initialization"

This reverts commit 3e1be7ad2d38c6bd6aeef96df9bd0a7822f4e51c.
---
 drivers/net/ethernet/broadcom/bnx2.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
index 27f11a5..ecd357d 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -6356,6 +6356,10 @@  bnx2_open(struct net_device *dev)
 	struct bnx2 *bp = netdev_priv(dev);
 	int rc;
 
+	rc = bnx2_request_firmware(bp);
+	if (rc < 0)
+		goto out;
+
 	netif_carrier_off(dev);
 
 	bnx2_disable_int(bp);
@@ -6424,6 +6428,7 @@  bnx2_open(struct net_device *dev)
 	bnx2_free_irq(bp);
 	bnx2_free_mem(bp);
 	bnx2_del_napi(bp);
+	bnx2_release_firmware(bp);
 	goto out;
 }
 
@@ -8570,12 +8575,6 @@  bnx2_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	pci_set_drvdata(pdev, dev);
 
-	rc = bnx2_request_firmware(bp);
-	if (rc < 0)
-		goto error;
-
-
-	bnx2_reset_chip(bp, BNX2_DRV_MSG_CODE_RESET);
 	memcpy(dev->dev_addr, bp->mac_addr, ETH_ALEN);
 
 	dev->hw_features = NETIF_F_IP_CSUM | NETIF_F_SG |
@@ -8608,7 +8607,6 @@  bnx2_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	return 0;
 
 error:
-	bnx2_release_firmware(bp);
 	pci_iounmap(pdev, bp->regview);
 	pci_release_regions(pdev);
 	pci_disable_device(pdev);
-- 
2.4.1