[for-4.14.y,4/4] nvme_fc: fix ctrl create failures racing with workq items

Message ID 1539163789-32338-4-git-send-email-amit.pundir@linaro.org
State New
Headers show
Series
  • [for-4.14.y,1/4] cgroup/cpuset: remove circular dependency deadlock
Related show

Commit Message

Amit Pundir Oct. 10, 2018, 9:29 a.m.
From: James Smart <jsmart2021@gmail.com>


commit cf25809bec2c7df4b45df5b2196845d9a4a3c89b upstream.

If there are errors during initial controller create, the transport
will teardown the partially initialized controller struct and free
the ctlr memory.  Trouble is - most of those errors can occur due
to asynchronous events happening such io timeouts and subsystem
connectivity failures. Those failures invoke async workq items to
reset the controller and attempt reconnect.  Those may be in progress
as the main thread frees the ctrl memory, resulting in NULL ptr oops.

Prevent this from happening by having the main ctrl failure thread
changing state to DELETING followed by synchronously cancelling any
pending queued work item. The change of state will prevent the
scheduling of resets or reconnect events.

Signed-off-by: James Smart <james.smart@broadcom.com>

Signed-off-by: Keith Busch <keith.busch@intel.com>

Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Amit Pundir <amit.pundir@linaro.org>

---
Build tested on 4.14.74 for ARCH=arm/arm64 allmodconfig.

 drivers/nvme/host/fc.c | 4 ++++
 1 file changed, 4 insertions(+)

-- 
2.7.4

Comments

Greg KH Oct. 11, 2018, 9:28 a.m. | #1
On Wed, Oct 10, 2018 at 02:59:49PM +0530, Amit Pundir wrote:
> From: James Smart <jsmart2021@gmail.com>

> 

> commit cf25809bec2c7df4b45df5b2196845d9a4a3c89b upstream.

> 

> If there are errors during initial controller create, the transport

> will teardown the partially initialized controller struct and free

> the ctlr memory.  Trouble is - most of those errors can occur due

> to asynchronous events happening such io timeouts and subsystem

> connectivity failures. Those failures invoke async workq items to

> reset the controller and attempt reconnect.  Those may be in progress

> as the main thread frees the ctrl memory, resulting in NULL ptr oops.

> 

> Prevent this from happening by having the main ctrl failure thread

> changing state to DELETING followed by synchronously cancelling any

> pending queued work item. The change of state will prevent the

> scheduling of resets or reconnect events.

> 

> Signed-off-by: James Smart <james.smart@broadcom.com>

> Signed-off-by: Keith Busch <keith.busch@intel.com>

> Signed-off-by: Jens Axboe <axboe@kernel.dk>

> Signed-off-by: Amit Pundir <amit.pundir@linaro.org>

> ---

> Build tested on 4.14.74 for ARCH=arm/arm64 allmodconfig.


Now applied, thanks.

greg k-h

Patch

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 7deb7b5d8683..058d542647dd 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2868,6 +2868,10 @@  nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 	}
 
 	if (ret) {
+		nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING);
+		cancel_work_sync(&ctrl->ctrl.reset_work);
+		cancel_delayed_work_sync(&ctrl->connect_work);
+
 		/* couldn't schedule retry - fail out */
 		dev_err(ctrl->ctrl.device,
 			"NVME-FC{%d}: Connect retry failed\n", ctrl->cnum);