diff mbox series

[RESEND] scsi: qedi: Fix use after free bug in qedi_remove due to race condition

Message ID 20230318081303.792969-1-zyytlz.wz@163.com
State New
Headers show
Series [RESEND] scsi: qedi: Fix use after free bug in qedi_remove due to race condition | expand

Commit Message

Zheng Wang March 18, 2023, 8:13 a.m. UTC
In qedi_probe, it calls __qedi_probe, which bound &qedi->recovery_work
with qedi_recovery_handler and bound &qedi->board_disable_work
with qedi_board_disable_work.

When it calls qedi_schedule_recovery_handler, it will finally
call schedule_delayed_work to start the work.

When we call qedi_remove to remove the driver, there
may be a sequence as follows:

Fix it by finishing the work before cleanup in qedi_remove.

CPU0                  CPU1

                     |qedi_recovery_handler
qedi_remove          |
  __qedi_remove      |
iscsi_host_free      |
scsi_host_put        |
//free shost         |
                     |iscsi_host_for_each_session
                     |//use qedi->shost

Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
---
 drivers/scsi/qedi/qedi_main.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Manish Rangankar March 23, 2023, 10:17 a.m. UTC | #1
> -----Original Message-----
> From: Zheng Hacker <hackerzheng666@gmail.com>
> Sent: Thursday, March 23, 2023 9:15 AM
> To: Mike Christie <michael.christie@oracle.com>
> Cc: Zheng Wang <zyytlz.wz@163.com>; Nilesh Javali <njavali@marvell.com>;
> Manish Rangankar <mrangankar@marvell.com>; GR-QLogic-Storage-
> Upstream <GR-QLogic-Storage-Upstream@marvell.com>;
> jejb@linux.ibm.com; martin.petersen@oracle.com; linux-
> scsi@vger.kernel.org; linux-kernel@vger.kernel.org;
> 1395428693sheep@gmail.com; alex000young@gmail.com
> Subject: [EXT] Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in
> qedi_remove due to race condition
> 
> External Email
> 
> ----------------------------------------------------------------------
> Mike Christie <michael.christie@oracle.com> 于2023年3月21日周二 00:11写
> 道:
> >
> > On 3/18/23 3:13 AM, Zheng Wang wrote:
> > > In qedi_probe, it calls __qedi_probe, which bound
> > > &qedi->recovery_work with qedi_recovery_handler and bound
> > > &qedi->board_disable_work with qedi_board_disable_work.
> > >
> > > When it calls qedi_schedule_recovery_handler, it will finally call
> > > schedule_delayed_work to start the work.
> > >
> > > When we call qedi_remove to remove the driver, there may be a
> > > sequence as follows:
> > >
> > > Fix it by finishing the work before cleanup in qedi_remove.
> > >
> > > CPU0                  CPU1
> > >
> > >                      |qedi_recovery_handler
> > > qedi_remove          |
> > >   __qedi_remove      |
> > > iscsi_host_free      |
> > > scsi_host_put        |
> > > //free shost         |
> > >                      |iscsi_host_for_each_session
> > >                      |//use qedi->shost
> > >
> > > Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process")
> > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
> > > ---
> > >  drivers/scsi/qedi/qedi_main.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/drivers/scsi/qedi/qedi_main.c
> > > b/drivers/scsi/qedi/qedi_main.c index f2ee49756df8..25223f6f5344
> > > 100644
> > > --- a/drivers/scsi/qedi/qedi_main.c
> > > +++ b/drivers/scsi/qedi/qedi_main.c
> > > @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev
> *pdev, int mode)
> > >       int rval;
> > >       u16 retry = 10;
> > >
> > > +     /*cancel work*/
> >
> > This comment is not needed. The name of the functions you are calling
> > have "cancel" and "work" in them so we know. If you want to add a
> > comment explain why the cancel calls are needed here.
> >
> 
> Hi,
> 
> Sorry for my late reply and thanks for your advice. Will remove it in the next
> version of patch.
> 
> >
> > > +     cancel_delayed_work_sync(&qedi->recovery_work);
> > > +     cancel_delayed_work_sync(&qedi->board_disable_work);
> >
> >
> > How do you know after you have called cancel_delayed_work_sync that
> > schedule_recovery_handler or schedule_hw_err_handler can't be called?
> > I don't know the qed driver well, but it looks like you could have
> > operations still running, so after you cancel here one of those ops
> > could lead to them scheduling the work again.
> >
> 
> Sorry I didn't know how to make sure there's no more schedule. But I do
> think this is important. Maybe there're someone else who can give us advice.
> 
> Best regards,
> Zheng
> >

Best place to call cancel_delayed_work_sync is after qedi_ops->stop(qedi->cdev) and 
qedi_ops->ll2->stop(qedi->cdev);, after these qed calls firmware will not post any events to qedi driver.

Thanks,
Manish
Zheng Hacker March 24, 2023, 3:21 p.m. UTC | #2
Manish Rangankar <mrangankar@marvell.com> 于2023年3月23日周四 18:17写道:
>
>
>
> > -----Original Message-----
> > From: Zheng Hacker <hackerzheng666@gmail.com>
> > Sent: Thursday, March 23, 2023 9:15 AM
> > To: Mike Christie <michael.christie@oracle.com>
> > Cc: Zheng Wang <zyytlz.wz@163.com>; Nilesh Javali <njavali@marvell.com>;
> > Manish Rangankar <mrangankar@marvell.com>; GR-QLogic-Storage-
> > Upstream <GR-QLogic-Storage-Upstream@marvell.com>;
> > jejb@linux.ibm.com; martin.petersen@oracle.com; linux-
> > scsi@vger.kernel.org; linux-kernel@vger.kernel.org;
> > 1395428693sheep@gmail.com; alex000young@gmail.com
> > Subject: [EXT] Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in
> > qedi_remove due to race condition
> >
> > External Email
> >
> > ----------------------------------------------------------------------
> > Mike Christie <michael.christie@oracle.com> 于2023年3月21日周二 00:11写
> > 道:
> > >
> > > On 3/18/23 3:13 AM, Zheng Wang wrote:
> > > > In qedi_probe, it calls __qedi_probe, which bound
> > > > &qedi->recovery_work with qedi_recovery_handler and bound
> > > > &qedi->board_disable_work with qedi_board_disable_work.
> > > >
> > > > When it calls qedi_schedule_recovery_handler, it will finally call
> > > > schedule_delayed_work to start the work.
> > > >
> > > > When we call qedi_remove to remove the driver, there may be a
> > > > sequence as follows:
> > > >
> > > > Fix it by finishing the work before cleanup in qedi_remove.
> > > >
> > > > CPU0                  CPU1
> > > >
> > > >                      |qedi_recovery_handler
> > > > qedi_remove          |
> > > >   __qedi_remove      |
> > > > iscsi_host_free      |
> > > > scsi_host_put        |
> > > > //free shost         |
> > > >                      |iscsi_host_for_each_session
> > > >                      |//use qedi->shost
> > > >
> > > > Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process")
> > > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
> > > > ---
> > > >  drivers/scsi/qedi/qedi_main.c | 4 ++++
> > > >  1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/drivers/scsi/qedi/qedi_main.c
> > > > b/drivers/scsi/qedi/qedi_main.c index f2ee49756df8..25223f6f5344
> > > > 100644
> > > > --- a/drivers/scsi/qedi/qedi_main.c
> > > > +++ b/drivers/scsi/qedi/qedi_main.c
> > > > @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev
> > *pdev, int mode)
> > > >       int rval;
> > > >       u16 retry = 10;
> > > >
> > > > +     /*cancel work*/
> > >
> > > This comment is not needed. The name of the functions you are calling
> > > have "cancel" and "work" in them so we know. If you want to add a
> > > comment explain why the cancel calls are needed here.
> > >
> >
> > Hi,
> >
> > Sorry for my late reply and thanks for your advice. Will remove it in the next
> > version of patch.
> >
> > >
> > > > +     cancel_delayed_work_sync(&qedi->recovery_work);
> > > > +     cancel_delayed_work_sync(&qedi->board_disable_work);
> > >
> > >
> > > How do you know after you have called cancel_delayed_work_sync that
> > > schedule_recovery_handler or schedule_hw_err_handler can't be called?
> > > I don't know the qed driver well, but it looks like you could have
> > > operations still running, so after you cancel here one of those ops
> > > could lead to them scheduling the work again.
> > >
> >
> > Sorry I didn't know how to make sure there's no more schedule. But I do
> > think this is important. Maybe there're someone else who can give us advice.
> >
> > Best regards,
> > Zheng
> > >
>
> Best place to call cancel_delayed_work_sync is after qedi_ops->stop(qedi->cdev) and
> qedi_ops->ll2->stop(qedi->cdev);, after these qed calls firmware will not post any events to qedi driver.
>

Sorry for my late reply. Will apply that in next version.

Best reagrds,
Zheng

> Thanks,
> Manish
>
diff mbox series

Patch

diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index f2ee49756df8..25223f6f5344 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -2414,6 +2414,10 @@  static void __qedi_remove(struct pci_dev *pdev, int mode)
 	int rval;
 	u16 retry = 10;
 
+	/*cancel work*/
+	cancel_delayed_work_sync(&qedi->recovery_work);
+	cancel_delayed_work_sync(&qedi->board_disable_work);
+
 	if (mode == QEDI_MODE_NORMAL)
 		iscsi_host_remove(qedi->shost, false);
 	else if (mode == QEDI_MODE_SHUTDOWN)