diff mbox series

scsi: sg: Avoid sg device teardown race

Message ID 20240318175021.22739-1-Alexander@wetzel-home.de
State Superseded
Headers show
Series scsi: sg: Avoid sg device teardown race | expand

Commit Message

Alexander Wetzel March 18, 2024, 5:50 p.m. UTC
sg_remove_sfp_usercontext() must not use sg_device_destroy() after
calling scsi_device_put().

sg_device_destroy() is accessling the device queue. Which will be set to
NULL if scsi_device_put() removes the last reference to the sg device.

Link: https://lore.kernel.org/r/20240305150509.23896-1-Alexander@wetzel-home.de
Cc: <stable@vger.kernel.org>
Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de>
---

This is my best shot for a real fix of the issue.
I confirmed with printk's that I get the NULL pointer freeze ony when
scsi_device_put() is deleting the last reference to the device.
In the cases where it's not crashing there is still a reference left
after the call.

I don't see any obvious down side of simply swapping the calls.
The alternative would by my first patch, just without the WARN_ON.

Alexander
---
 drivers/scsi/sg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Alexander Wetzel March 20, 2024, 11:42 a.m. UTC | #1
On 20.03.24 12:16, Greg KH wrote:
> On Wed, Mar 20, 2024 at 12:08:09PM +0100, Alexander Wetzel wrote:
>> sg_remove_sfp_usercontext() must not use sg_device_destroy() after
>> calling scsi_device_put().
>>
>> sg_device_destroy() is accessing the parent scsi device request_queue.
>> Which will already be set to NULL when the preceding call to
>> scsi_device_put() removed the last reference to the parent scsi device.
>>
>> The resulting NULL pointer exception will then crash the kernel.
>>
>> Link: https://lore.kernel.org/r/20240305150509.23896-1-Alexander@wetzel-home.de
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de>
>> ---
>> Changes compared to V1:
>> Reworked the commit message
> 
> What commit id does this fix?

It's a combination of patches. I think
db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage") was the 
one which finally broke it.

The in the hindsight wrong sequence was introduced via:
c6517b7942fa ("[SCSI] sg: fix races during device removal")
and cc833acbee9d ("sg: O_EXCL and other lock handling")

Alexander
Alexander Wetzel March 20, 2024, 4:58 p.m. UTC | #2
On 20.03.24 16:02, Bart Van Assche wrote:
> On 3/20/24 04:08, Alexander Wetzel wrote:
>> sg_remove_sfp_usercontext() must not use sg_device_destroy() after
>> calling scsi_device_put().
>>
>> sg_device_destroy() is accessing the parent scsi device request_queue.
>> Which will already be set to NULL when the preceding call to
>> scsi_device_put() removed the last reference to the parent scsi device.
>>
>> The resulting NULL pointer exception will then crash the kernel.
>>
>> Link: 
>> https://lore.kernel.org/r/20240305150509.23896-1-Alexander@wetzel-home.de
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de>
>> ---
>> Changes compared to V1:
>> Reworked the commit message
>>
>> Alexander
>> ---
>>   drivers/scsi/sg.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
>> index 86210e4dd0d3..80e0d1981191 100644
>> --- a/drivers/scsi/sg.c
>> +++ b/drivers/scsi/sg.c
>> @@ -2232,8 +2232,8 @@ sg_remove_sfp_usercontext(struct work_struct *work)
>>               "sg_remove_sfp: sfp=0x%p\n", sfp));
>>       kfree(sfp);
>> -    scsi_device_put(sdp->device);
>>       kref_put(&sdp->d_ref, sg_device_destroy);
>> +    scsi_device_put(sdp->device);
>>       module_put(THIS_MODULE);
>>   }
> 
> Is it guaranteed that the above kref_put() call is the last kref_put()
> call on sdp->d_ref? If not, how about inserting code between the
> kref_put() call and the scsi_device_put() call that waits until
> sg_device_destroy() has finished?
> 

While I'm not familiar with the code, I'm pretty sure kref_put() is 
removing the last reference to d_ref here. Anything else would be odd, 
based on my - really sketchy - understanding of the flows.

Also waiting for another process looks wrong. I guess we would then have 
to delay the call to sg_release().

And at least for me it's always the last d_ref reference.
I changed the section to:

         kref_put(&sdp->d_ref, sg_device_destroy);
         printk("XXXX scsi=%u, dref=%u\n", \
		kref_read(&sdp->device->sdev_gendev.kobj.kref), \
		kref_read(&sdp->d_ref));
         scsi_device_put(sdp->device);

And connected/disconnected my test USB device a few times:
  XXXX scsi=2, dref=0
  XXXX scsi=1, dref=0
  XXXX scsi=2, dref=0
  XXXX scsi=1, dref=0
  XXXX scsi=1, dref=0
  XXXX scsi=1, dref=0
  XXXX scsi=1, dref=0
  XXXX scsi=1, dref=0
  XXXX scsi=1, dref=0
  XXXX scsi=1, dref=0

(scsi=1 are the cases which would cause the NULL pointer exceptions with 
the unpatched driver.)

Alexander
Bart Van Assche March 20, 2024, 5:45 p.m. UTC | #3
On 3/20/24 09:58, Alexander Wetzel wrote:
> While I'm not familiar with the code, I'm pretty sure kref_put() is 
> removing the last reference to d_ref here. Anything else would be odd, 
> based on my - really sketchy - understanding of the flows.

Please document this by adding a WARN_ON_ONCE() statement before the
kref_put() call that checks that the refcount equals one.

Thanks,

Bart.
Bart Van Assche March 20, 2024, 5:46 p.m. UTC | #4
On 3/20/24 04:08, Alexander Wetzel wrote:
> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
> index 86210e4dd0d3..80e0d1981191 100644
> --- a/drivers/scsi/sg.c
> +++ b/drivers/scsi/sg.c
> @@ -2232,8 +2232,8 @@ sg_remove_sfp_usercontext(struct work_struct *work)
>   			"sg_remove_sfp: sfp=0x%p\n", sfp));
>   	kfree(sfp);
>   
> -	scsi_device_put(sdp->device);
>   	kref_put(&sdp->d_ref, sg_device_destroy);
> +	scsi_device_put(sdp->device);
>   	module_put(THIS_MODULE);
>   }

Since sg_device_destroy() frees struct sg_device and since the
scsi_device_put() call reads from struct sg_device, does this patch
introduce a use-after-free? Has it been tested with KASAN enabled?

Thanks,

Bart.
diff mbox series

Patch

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 86210e4dd0d3..80e0d1981191 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -2232,8 +2232,8 @@  sg_remove_sfp_usercontext(struct work_struct *work)
 			"sg_remove_sfp: sfp=0x%p\n", sfp));
 	kfree(sfp);
 
-	scsi_device_put(sdp->device);
 	kref_put(&sdp->d_ref, sg_device_destroy);
+	scsi_device_put(sdp->device);
 	module_put(THIS_MODULE);
 }