mbox series

[v2,0/3] Call blk_mq_free_tag_set() earlier

Message ID 20220630213733.17689-1-bvanassche@acm.org
Headers show
Series Call blk_mq_free_tag_set() earlier | expand

Message

Bart Van Assche June 30, 2022, 9:37 p.m. UTC
Hi Martin,

This patch series fixes a recently reported use-after-free in the SRP driver.
Please consider this patch series for kernel v5.20.

Changes compared to v1:
- Expanded this series from one to three patches.
- Fixed the description of patch 3/3.

Bart Van Assche (3):
  scsi: Simplify scsi_forget_host()
  scsi: Make scsi_forget_host() wait for request queue removal
  scsi: core: Call blk_mq_free_tag_set() earlier

 drivers/scsi/hosts.c     | 10 +++++-----
 drivers/scsi/scsi_lib.c  |  3 +++
 drivers/scsi/scsi_scan.c | 34 +++++++++++++++++++++++++++++-----
 3 files changed, 37 insertions(+), 10 deletions(-)

Comments

Ming Lei July 1, 2022, 2:37 p.m. UTC | #1
On Fri, Jul 01, 2022 at 07:07:13AM -0700, Bart Van Assche wrote:
> On 6/30/22 20:44, Ming Lei wrote:
> > On Thu, Jun 30, 2022 at 02:37:33PM -0700, Bart Van Assche wrote:
> > > There are two .exit_cmd_priv implementations. Both implementations use
> > > resources associated with the SCSI host. Make sure that these resources are
> > 
> > Please document what the exact resources associated with this SCSI host is.
> > 
> > We need the root cause.
> > 
> > I understand it might be related with module unloading, since ib_srp may
> > be gone already, but not sure if it is the exact one in this report.
> 
> It is not necessary to unload ib_srp to trigger this scenario.
> Hot-unplugging an RDMA adapter used by the ib_srp driver is sufficient.
> Hot-unplugging triggers a call of srp_remove_one(). srp_remove_one() itself
> and also its caller free resources used by srp_exit_cmd_priv(), e.g. struct
> ib_device.

OK, looks it isn't same with Changhui's report.

> 
> > > still available when .exit_cmd_priv is called by moving the .exit_cmd_priv
> > > calls from scsi_host_dev_release() to scsi_forget_host(). Moving
> > > blk_mq_free_tag_set() from scsi_host_dev_release() to scsi_forget_host() is
> > > safe because scsi_forget_host() drains all the request queues that use the
> > > host tag set. This guarantees that no requests are in flight and also that
> > > no new requests will be allocated from the host tag set.
> > > 
> > > This patch fixes the following use-after-free:
> > > 
> > > ==================================================================
> > > BUG: KASAN: use-after-free in srp_exit_cmd_priv+0x27/0xd0 [ib_srp]
> > > Read of size 8 at addr ffff888100337000 by task multipathd/16727
> > 
> > What is the 8bytes buffer which triggers UAF? what does srp_exit_cmd_priv+0x27
> > point to?
> 
> I think that Li already answered this question.

OK, from Li's input, the UAF is on the following code:

	struct srp_device *dev = target->srp_host->srp_dev;

So looks you meant target->srp_host is freed by srp_remove_one() before calling
srp_exit_cmd_priv?

Then when is srp_remove_one() triggered? And why is it called before
scsi_remove_host()? Sorry for the stupid question since I am not familiar with srp.


Thanks,
Ming
Bart Van Assche July 1, 2022, 11:58 p.m. UTC | #2
On 7/1/22 07:37, Ming Lei wrote:
> On Fri, Jul 01, 2022 at 07:07:13AM -0700, Bart Van Assche wrote:
>> On 6/30/22 20:44, Ming Lei wrote:
>>> On Thu, Jun 30, 2022 at 02:37:33PM -0700, Bart Van Assche wrote:
>>>> BUG: KASAN: use-after-free in srp_exit_cmd_priv+0x27/0xd0 [ib_srp]
>>>> Read of size 8 at addr ffff888100337000 by task multipathd/16727
>>>
>>> What is the 8bytes buffer which triggers UAF? what does srp_exit_cmd_priv+0x27
>>> point to?
>>
>> I think that Li already answered this question.
> 
> OK, from Li's input, the UAF is on the following code:
> 
> 	struct srp_device *dev = target->srp_host->srp_dev;
> 
> So looks you meant target->srp_host is freed by srp_remove_one() before calling
> srp_exit_cmd_priv?
> 
> Then when is srp_remove_one() triggered? And why is it called before
> scsi_remove_host()? Sorry for the stupid question since I am not familiar with srp.

Hi Ming,

I think that can happen as the result of the following sequence (will 
look into converting this into a blktests test):
* The Soft-RoCE (or soft-iWARP) driver is bound to a network interface.
   This results in the instantation of an RDMA interface that supports
   RDMA loopback.
* ib_srp and ib_srpt are told to connect to each other over that RDMA
   loopback interface. This results in the creation of a SCSI host and
   one or more SCSI devices.
* The Soft-RoCE (or soft-iWARP) driver is dissociated from all network
   interfaces. This causes the RDMA core to report a hot-unplug event.
   That results in a call of srp_remove_one(). I think the call chain is
   as follows:

rxe_notify()
   ib_unregister_device_queued()
     ib_unregister_work()
         __ib_unregister_device()
           disable_device()
             remove_client_context()
               srp_remove_one()

Bart.