diff mbox series

libata: Ensure ata_port probe has completed before detach

Message ID 1571221192-216909-1-git-send-email-john.garry@huawei.com
State Accepted
Commit 130f4caf145c3562108b245a576db30b916199d2
Headers show
Series libata: Ensure ata_port probe has completed before detach | expand

Commit Message

John Garry Oct. 16, 2019, 10:19 a.m. UTC
With CONFIG_DEBUG_TEST_DRIVER_REMOVE set, we may find the following WARN:

[   23.452574] ------------[ cut here ]------------
[   23.457190] WARNING: CPU: 59 PID: 1 at drivers/ata/libata-core.c:6676 ata_host_detach+0x15c/0x168
[   23.466047] Modules linked in:
[   23.469092] CPU: 59 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc1-00010-g5b83fd27752b-dirty #296
[   23.477776] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 03/15/2019
[   23.486286] pstate: a0c00009 (NzCv daif +PAN +UAO)
[   23.491065] pc : ata_host_detach+0x15c/0x168
[   23.495322] lr : ata_host_detach+0x88/0x168
[   23.499491] sp : ffff800011cabb50
[   23.502792] x29: ffff800011cabb50 x28: 0000000000000007
[   23.508091] x27: ffff80001137f068 x26: ffff8000112c0c28
[   23.513390] x25: 0000000000003848 x24: ffff0023ea185300
[   23.518689] x23: 0000000000000001 x22: 00000000000014c0
[   23.523987] x21: 0000000000013740 x20: ffff0023bdc20000
[   23.529286] x19: 0000000000000000 x18: 0000000000000004
[   23.534584] x17: 0000000000000001 x16: 00000000000000f0
[   23.539883] x15: ffff0023eac13790 x14: ffff0023eb76c408
[   23.545181] x13: 0000000000000000 x12: ffff0023eac13790
[   23.550480] x11: ffff0023eb76c228 x10: 0000000000000000
[   23.555779] x9 : ffff0023eac13798 x8 : 0000000040000000
[   23.561077] x7 : 0000000000000002 x6 : 0000000000000001
[   23.566376] x5 : 0000000000000002 x4 : 0000000000000000
[   23.571674] x3 : ffff0023bf08a0bc x2 : 0000000000000000
[   23.576972] x1 : 3099674201f72700 x0 : 0000000000400284
[   23.582272] Call trace:
[   23.584706]  ata_host_detach+0x15c/0x168
[   23.588616]  ata_pci_remove_one+0x10/0x18
[   23.592615]  ahci_remove_one+0x20/0x40
[   23.596356]  pci_device_remove+0x3c/0xe0
[   23.600267]  really_probe+0xdc/0x3e0
[   23.603830]  driver_probe_device+0x58/0x100
[   23.608000]  device_driver_attach+0x6c/0x90
[   23.612169]  __driver_attach+0x84/0xc8
[   23.615908]  bus_for_each_dev+0x74/0xc8
[   23.619730]  driver_attach+0x20/0x28
[   23.623292]  bus_add_driver+0x148/0x1f0
[   23.627115]  driver_register+0x60/0x110
[   23.630938]  __pci_register_driver+0x40/0x48
[   23.635199]  ahci_pci_driver_init+0x20/0x28
[   23.639372]  do_one_initcall+0x5c/0x1b0
[   23.643199]  kernel_init_freeable+0x1a4/0x24c
[   23.647546]  kernel_init+0x10/0x108
[   23.651023]  ret_from_fork+0x10/0x18
[   23.654590] ---[ end trace 634a14b675b71c13 ]---

With KASAN also enabled, we may also get many use-after-free reports.

The issue is that when CONFIG_DEBUG_TEST_DRIVER_REMOVE is set, we may
attempt to detach the ata_port before it has been probed.

This is because the ata_ports are async probed, meaning that there is no
guarantee that the ata_port has probed prior to detach. When the ata_port
does probe in this scenario, we get all sorts of issues as the detach may
have already happened.

Fix by ensuring synchronisation with async_synchronize_full(). We could
alternatively use the cookie returned from the ata_port probe
async_schedule() call, but that means managing the cookie, so more
complicated.

Signed-off-by: John Garry <john.garry@huawei.com>

---
Note: This has only been boot tested and manual driver remove/add.
         My system has no disk attached to the ahci host.

-- 
2.17.1

Comments

Jens Axboe Oct. 16, 2019, 7:09 p.m. UTC | #1
On 10/16/19 4:19 AM, John Garry wrote:
> With CONFIG_DEBUG_TEST_DRIVER_REMOVE set, we may find the following WARN:

> 

> [   23.452574] ------------[ cut here ]------------

> [   23.457190] WARNING: CPU: 59 PID: 1 at drivers/ata/libata-core.c:6676 ata_host_detach+0x15c/0x168

> [   23.466047] Modules linked in:

> [   23.469092] CPU: 59 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc1-00010-g5b83fd27752b-dirty #296

> [   23.477776] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 03/15/2019

> [   23.486286] pstate: a0c00009 (NzCv daif +PAN +UAO)

> [   23.491065] pc : ata_host_detach+0x15c/0x168

> [   23.495322] lr : ata_host_detach+0x88/0x168

> [   23.499491] sp : ffff800011cabb50

> [   23.502792] x29: ffff800011cabb50 x28: 0000000000000007

> [   23.508091] x27: ffff80001137f068 x26: ffff8000112c0c28

> [   23.513390] x25: 0000000000003848 x24: ffff0023ea185300

> [   23.518689] x23: 0000000000000001 x22: 00000000000014c0

> [   23.523987] x21: 0000000000013740 x20: ffff0023bdc20000

> [   23.529286] x19: 0000000000000000 x18: 0000000000000004

> [   23.534584] x17: 0000000000000001 x16: 00000000000000f0

> [   23.539883] x15: ffff0023eac13790 x14: ffff0023eb76c408

> [   23.545181] x13: 0000000000000000 x12: ffff0023eac13790

> [   23.550480] x11: ffff0023eb76c228 x10: 0000000000000000

> [   23.555779] x9 : ffff0023eac13798 x8 : 0000000040000000

> [   23.561077] x7 : 0000000000000002 x6 : 0000000000000001

> [   23.566376] x5 : 0000000000000002 x4 : 0000000000000000

> [   23.571674] x3 : ffff0023bf08a0bc x2 : 0000000000000000

> [   23.576972] x1 : 3099674201f72700 x0 : 0000000000400284

> [   23.582272] Call trace:

> [   23.584706]  ata_host_detach+0x15c/0x168

> [   23.588616]  ata_pci_remove_one+0x10/0x18

> [   23.592615]  ahci_remove_one+0x20/0x40

> [   23.596356]  pci_device_remove+0x3c/0xe0

> [   23.600267]  really_probe+0xdc/0x3e0

> [   23.603830]  driver_probe_device+0x58/0x100

> [   23.608000]  device_driver_attach+0x6c/0x90

> [   23.612169]  __driver_attach+0x84/0xc8

> [   23.615908]  bus_for_each_dev+0x74/0xc8

> [   23.619730]  driver_attach+0x20/0x28

> [   23.623292]  bus_add_driver+0x148/0x1f0

> [   23.627115]  driver_register+0x60/0x110

> [   23.630938]  __pci_register_driver+0x40/0x48

> [   23.635199]  ahci_pci_driver_init+0x20/0x28

> [   23.639372]  do_one_initcall+0x5c/0x1b0

> [   23.643199]  kernel_init_freeable+0x1a4/0x24c

> [   23.647546]  kernel_init+0x10/0x108

> [   23.651023]  ret_from_fork+0x10/0x18

> [   23.654590] ---[ end trace 634a14b675b71c13 ]---

> 

> With KASAN also enabled, we may also get many use-after-free reports.

> 

> The issue is that when CONFIG_DEBUG_TEST_DRIVER_REMOVE is set, we may

> attempt to detach the ata_port before it has been probed.

> 

> This is because the ata_ports are async probed, meaning that there is no

> guarantee that the ata_port has probed prior to detach. When the ata_port

> does probe in this scenario, we get all sorts of issues as the detach may

> have already happened.

> 

> Fix by ensuring synchronisation with async_synchronize_full(). We could

> alternatively use the cookie returned from the ata_port probe

> async_schedule() call, but that means managing the cookie, so more

> complicated.

> 

> Signed-off-by: John Garry <john.garry@huawei.com>

> ---

> Note: This has only been boot tested and manual driver remove/add.

>           My system has no disk attached to the ahci host.

> 

> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c

> index 28c492be0a57..74c9b3032d46 100644

> --- a/drivers/ata/libata-core.c

> +++ b/drivers/ata/libata-core.c

> @@ -6708,6 +6708,9 @@ void ata_host_detach(struct ata_host *host)

>   {

>   	int i;

>   

> +	/* Ensure ata_port probe has completed */

> +	async_synchronize_full();

> +

>   	for (i = 0; i < host->n_ports; i++)

>   		ata_port_detach(host->ports[i]);

>   

> 


Nice debugging, and the fix looks appropriate to me. I don't think
there's any point in trying to individually synchronize cookies.
I'll let this simmer on the list for a day or two to let other folks
take a look at it, before queuing it up.

-- 
Jens Axboe
John Garry Oct. 31, 2019, 6:35 p.m. UTC | #2
On 16/10/2019 20:09, Jens Axboe wrote:
> On 10/16/19 4:19 AM, John Garry wrote:

>> With CONFIG_DEBUG_TEST_DRIVER_REMOVE set, we may find the following WARN:

>>

>> [   23.452574] ------------[ cut here ]------------

>> [   23.457190] WARNING: CPU: 59 PID: 1 at drivers/ata/libata-core.c:6676 ata_host_detach+0x15c/0x168

>> [   23.466047] Modules linked in:

>> [   23.469092] CPU: 59 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc1-00010-g5b83fd27752b-dirty #296

>> [   23.477776] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 03/15/2019

>> [   23.486286] pstate: a0c00009 (NzCv daif +PAN +UAO)

>> [   23.491065] pc : ata_host_detach+0x15c/0x168

>> [   23.495322] lr : ata_host_detach+0x88/0x168

>> [   23.499491] sp : ffff800011cabb50

>> [   23.502792] x29: ffff800011cabb50 x28: 0000000000000007

>> [   23.508091] x27: ffff80001137f068 x26: ffff8000112c0c28

>> [   23.513390] x25: 0000000000003848 x24: ffff0023ea185300

>> [   23.518689] x23: 0000000000000001 x22: 00000000000014c0

>> [   23.523987] x21: 0000000000013740 x20: ffff0023bdc20000

>> [   23.529286] x19: 0000000000000000 x18: 0000000000000004

>> [   23.534584] x17: 0000000000000001 x16: 00000000000000f0

>> [   23.539883] x15: ffff0023eac13790 x14: ffff0023eb76c408

>> [   23.545181] x13: 0000000000000000 x12: ffff0023eac13790

>> [   23.550480] x11: ffff0023eb76c228 x10: 0000000000000000

>> [   23.555779] x9 : ffff0023eac13798 x8 : 0000000040000000

>> [   23.561077] x7 : 0000000000000002 x6 : 0000000000000001

>> [   23.566376] x5 : 0000000000000002 x4 : 0000000000000000

>> [   23.571674] x3 : ffff0023bf08a0bc x2 : 0000000000000000

>> [   23.576972] x1 : 3099674201f72700 x0 : 0000000000400284

>> [   23.582272] Call trace:

>> [   23.584706]  ata_host_detach+0x15c/0x168

>> [   23.588616]  ata_pci_remove_one+0x10/0x18

>> [   23.592615]  ahci_remove_one+0x20/0x40

>> [   23.596356]  pci_device_remove+0x3c/0xe0

>> [   23.600267]  really_probe+0xdc/0x3e0

>> [   23.603830]  driver_probe_device+0x58/0x100

>> [   23.608000]  device_driver_attach+0x6c/0x90

>> [   23.612169]  __driver_attach+0x84/0xc8

>> [   23.615908]  bus_for_each_dev+0x74/0xc8

>> [   23.619730]  driver_attach+0x20/0x28

>> [   23.623292]  bus_add_driver+0x148/0x1f0

>> [   23.627115]  driver_register+0x60/0x110

>> [   23.630938]  __pci_register_driver+0x40/0x48

>> [   23.635199]  ahci_pci_driver_init+0x20/0x28

>> [   23.639372]  do_one_initcall+0x5c/0x1b0

>> [   23.643199]  kernel_init_freeable+0x1a4/0x24c

>> [   23.647546]  kernel_init+0x10/0x108

>> [   23.651023]  ret_from_fork+0x10/0x18

>> [   23.654590] ---[ end trace 634a14b675b71c13 ]---

>>

>> With KASAN also enabled, we may also get many use-after-free reports.

>>

>> The issue is that when CONFIG_DEBUG_TEST_DRIVER_REMOVE is set, we may

>> attempt to detach the ata_port before it has been probed.

>>

>> This is because the ata_ports are async probed, meaning that there is no

>> guarantee that the ata_port has probed prior to detach. When the ata_port

>> does probe in this scenario, we get all sorts of issues as the detach may

>> have already happened.

>>

>> Fix by ensuring synchronisation with async_synchronize_full(). We could

>> alternatively use the cookie returned from the ata_port probe

>> async_schedule() call, but that means managing the cookie, so more

>> complicated.

>>

>> Signed-off-by: John Garry <john.garry@huawei.com>

>> ---

>> Note: This has only been boot tested and manual driver remove/add.

>>            My system has no disk attached to the ahci host.

>>

>> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c

>> index 28c492be0a57..74c9b3032d46 100644

>> --- a/drivers/ata/libata-core.c

>> +++ b/drivers/ata/libata-core.c

>> @@ -6708,6 +6708,9 @@ void ata_host_detach(struct ata_host *host)

>>    {

>>    	int i;

>>    

>> +	/* Ensure ata_port probe has completed */

>> +	async_synchronize_full();

>> +

>>    	for (i = 0; i < host->n_ports; i++)

>>    		ata_port_detach(host->ports[i]);

>>    

>>

> 

> Nice debugging, and the fix looks appropriate to me. I don't think

> there's any point in trying to individually synchronize cookies.

> I'll let this simmer on the list for a day or two to let other folks

> take a look at it, before queuing it up.

> 


Hi Jens,

FWIW, I did also now test this on qemu with an emulated disk and it was ok.

Anyway, I don't mind if prefer to queue this early for 5.6 so it can sit 
on next for longer.

Cheers,
John
Jens Axboe Oct. 31, 2019, 7:19 p.m. UTC | #3
On 10/31/19 12:35 PM, John Garry wrote:
> On 16/10/2019 20:09, Jens Axboe wrote:

>> On 10/16/19 4:19 AM, John Garry wrote:

>>> With CONFIG_DEBUG_TEST_DRIVER_REMOVE set, we may find the following WARN:

>>>

>>> [   23.452574] ------------[ cut here ]------------

>>> [   23.457190] WARNING: CPU: 59 PID: 1 at drivers/ata/libata-core.c:6676 ata_host_detach+0x15c/0x168

>>> [   23.466047] Modules linked in:

>>> [   23.469092] CPU: 59 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc1-00010-g5b83fd27752b-dirty #296

>>> [   23.477776] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 03/15/2019

>>> [   23.486286] pstate: a0c00009 (NzCv daif +PAN +UAO)

>>> [   23.491065] pc : ata_host_detach+0x15c/0x168

>>> [   23.495322] lr : ata_host_detach+0x88/0x168

>>> [   23.499491] sp : ffff800011cabb50

>>> [   23.502792] x29: ffff800011cabb50 x28: 0000000000000007

>>> [   23.508091] x27: ffff80001137f068 x26: ffff8000112c0c28

>>> [   23.513390] x25: 0000000000003848 x24: ffff0023ea185300

>>> [   23.518689] x23: 0000000000000001 x22: 00000000000014c0

>>> [   23.523987] x21: 0000000000013740 x20: ffff0023bdc20000

>>> [   23.529286] x19: 0000000000000000 x18: 0000000000000004

>>> [   23.534584] x17: 0000000000000001 x16: 00000000000000f0

>>> [   23.539883] x15: ffff0023eac13790 x14: ffff0023eb76c408

>>> [   23.545181] x13: 0000000000000000 x12: ffff0023eac13790

>>> [   23.550480] x11: ffff0023eb76c228 x10: 0000000000000000

>>> [   23.555779] x9 : ffff0023eac13798 x8 : 0000000040000000

>>> [   23.561077] x7 : 0000000000000002 x6 : 0000000000000001

>>> [   23.566376] x5 : 0000000000000002 x4 : 0000000000000000

>>> [   23.571674] x3 : ffff0023bf08a0bc x2 : 0000000000000000

>>> [   23.576972] x1 : 3099674201f72700 x0 : 0000000000400284

>>> [   23.582272] Call trace:

>>> [   23.584706]  ata_host_detach+0x15c/0x168

>>> [   23.588616]  ata_pci_remove_one+0x10/0x18

>>> [   23.592615]  ahci_remove_one+0x20/0x40

>>> [   23.596356]  pci_device_remove+0x3c/0xe0

>>> [   23.600267]  really_probe+0xdc/0x3e0

>>> [   23.603830]  driver_probe_device+0x58/0x100

>>> [   23.608000]  device_driver_attach+0x6c/0x90

>>> [   23.612169]  __driver_attach+0x84/0xc8

>>> [   23.615908]  bus_for_each_dev+0x74/0xc8

>>> [   23.619730]  driver_attach+0x20/0x28

>>> [   23.623292]  bus_add_driver+0x148/0x1f0

>>> [   23.627115]  driver_register+0x60/0x110

>>> [   23.630938]  __pci_register_driver+0x40/0x48

>>> [   23.635199]  ahci_pci_driver_init+0x20/0x28

>>> [   23.639372]  do_one_initcall+0x5c/0x1b0

>>> [   23.643199]  kernel_init_freeable+0x1a4/0x24c

>>> [   23.647546]  kernel_init+0x10/0x108

>>> [   23.651023]  ret_from_fork+0x10/0x18

>>> [   23.654590] ---[ end trace 634a14b675b71c13 ]---

>>>

>>> With KASAN also enabled, we may also get many use-after-free reports.

>>>

>>> The issue is that when CONFIG_DEBUG_TEST_DRIVER_REMOVE is set, we may

>>> attempt to detach the ata_port before it has been probed.

>>>

>>> This is because the ata_ports are async probed, meaning that there is no

>>> guarantee that the ata_port has probed prior to detach. When the ata_port

>>> does probe in this scenario, we get all sorts of issues as the detach may

>>> have already happened.

>>>

>>> Fix by ensuring synchronisation with async_synchronize_full(). We could

>>> alternatively use the cookie returned from the ata_port probe

>>> async_schedule() call, but that means managing the cookie, so more

>>> complicated.

>>>

>>> Signed-off-by: John Garry <john.garry@huawei.com>

>>> ---

>>> Note: This has only been boot tested and manual driver remove/add.

>>>             My system has no disk attached to the ahci host.

>>>

>>> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c

>>> index 28c492be0a57..74c9b3032d46 100644

>>> --- a/drivers/ata/libata-core.c

>>> +++ b/drivers/ata/libata-core.c

>>> @@ -6708,6 +6708,9 @@ void ata_host_detach(struct ata_host *host)

>>>     {

>>>     	int i;

>>>     

>>> +	/* Ensure ata_port probe has completed */

>>> +	async_synchronize_full();

>>> +

>>>     	for (i = 0; i < host->n_ports; i++)

>>>     		ata_port_detach(host->ports[i]);

>>>     

>>>

>>

>> Nice debugging, and the fix looks appropriate to me. I don't think

>> there's any point in trying to individually synchronize cookies.

>> I'll let this simmer on the list for a day or two to let other folks

>> take a look at it, before queuing it up.

>>

> 

> Hi Jens,

> 

> FWIW, I did also now test this on qemu with an emulated disk and it was ok.

> 

> Anyway, I don't mind if prefer to queue this early for 5.6 so it can sit

> on next for longer.


I've queued it up for 5.5, no point waiting one extra release :-)

-- 
Jens Axboe
diff mbox series

Patch

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 28c492be0a57..74c9b3032d46 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -6708,6 +6708,9 @@  void ata_host_detach(struct ata_host *host)
 {
 	int i;
 
+	/* Ensure ata_port probe has completed */
+	async_synchronize_full();
+
 	for (i = 0; i < host->n_ports; i++)
 		ata_port_detach(host->ports[i]);