mbox series

[v3,00/12] Bug fixes and improvements for MHI power operations

Message ID 1604031057-32820-1-git-send-email-bbhatt@codeaurora.org
Headers show
Series Bug fixes and improvements for MHI power operations | expand

Message

Bhaumik Bhatt Oct. 30, 2020, 4:10 a.m. UTC
Bug fixes and improvements for MHI powerup and shutdown handling.
Firmware load function names are updated to accurately reflect their purpose.
Closed certain design gaps where the host (MHI bus) would allow clients to
operate after a power down or error detection.
Move to an error state sooner based on different scenarios.

These patches were tested on arm64 and X86_64 architectures.

v3:
-Fixed bus: mhi: core: Remove MHI event ring IRQ handlers when powering down
-Mistakenly placed the free_irq() calls in mhi_pm_sys_error_transition()
-Moved it to mhi_pm_disable_transition()

v2:
-Addressed patches based on review comments and made improvements
-Added bus: mhi: core: Check for IRQ availability during registration
-Dropped bus: mhi: core: Use the IRQF_ONESHOT flag for the BHI interrupt line
-Split bus: mhi: core: Move to an error state on any firmware load failure
-Modified the following patches:
-bus: mhi: core: Disable IRQs when powering down
-bus: mhi: core: Improve shutdown handling after link down detection
-bus: mhi: core: Mark device inactive soon after host issues a shutdown
-bus: mhi: core: Move to SYS_ERROR regardless of RDDM capability
-Addressed the above as follow-up patches with improvements:
-bus: mhi: core: Prevent sending multiple RDDM entry callbacks
-bus: mhi: core: Separate system error and power down handling
-bus: mhi: core: Remove MHI event ring IRQ handlers when powering down

Bhaumik Bhatt (12):
  bus: mhi: core: Use appropriate names for firmware load functions
  bus: mhi: core: Move to using high priority workqueue
  bus: mhi: core: Skip device wake in error or shutdown states
  bus: mhi: core: Move to SYS_ERROR regardless of RDDM capability
  bus: mhi: core: Prevent sending multiple RDDM entry callbacks
  bus: mhi: core: Move to an error state on any firmware load failure
  bus: mhi: core: Use appropriate label in firmware load handler API
  bus: mhi: core: Move to an error state on mission mode failure
  bus: mhi: core: Check for IRQ availability during registration
  bus: mhi: core: Separate system error and power down handling
  bus: mhi: core: Mark and maintain device states early on after power
    down
  bus: mhi: core: Remove MHI event ring IRQ handlers when powering down

 drivers/bus/mhi/core/boot.c |  60 ++++++-----
 drivers/bus/mhi/core/init.c |  10 +-
 drivers/bus/mhi/core/main.c |  16 +--
 drivers/bus/mhi/core/pm.c   | 236 ++++++++++++++++++++++++++++++++------------
 include/linux/mhi.h         |   2 +
 5 files changed, 225 insertions(+), 99 deletions(-)

Comments

Manivannan Sadhasivam Oct. 30, 2020, 1:29 p.m. UTC | #1
On Thu, Oct 29, 2020 at 09:10:46PM -0700, Bhaumik Bhatt wrote:
> mhi_fw_load_sbl() function is currently used to transfer SBL or EDL

> images over BHI (Boot Host Interface). Same goes with mhi_fw_load_amss()

> which uses BHIe. However, the contents of these functions do not

> indicate support for a specific set of images. Since these can be used

> for any image download over BHI or BHIe, rename them based on the

> protocol used.

> 

> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>


Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>


Thanks,
Mani

> ---

>  drivers/bus/mhi/core/boot.c | 19 ++++++++++---------

>  1 file changed, 10 insertions(+), 9 deletions(-)

> 

> diff --git a/drivers/bus/mhi/core/boot.c b/drivers/bus/mhi/core/boot.c

> index 24422f5..7d6b3a7 100644

> --- a/drivers/bus/mhi/core/boot.c

> +++ b/drivers/bus/mhi/core/boot.c

> @@ -171,7 +171,7 @@ int mhi_download_rddm_img(struct mhi_controller *mhi_cntrl, bool in_panic)

>  }

>  EXPORT_SYMBOL_GPL(mhi_download_rddm_img);

>  

> -static int mhi_fw_load_amss(struct mhi_controller *mhi_cntrl,

> +static int mhi_fw_load_bhie(struct mhi_controller *mhi_cntrl,

>  			    const struct mhi_buf *mhi_buf)

>  {

>  	void __iomem *base = mhi_cntrl->bhie;

> @@ -187,7 +187,7 @@ static int mhi_fw_load_amss(struct mhi_controller *mhi_cntrl,

>  	}

>  

>  	sequence_id = MHI_RANDOM_U32_NONZERO(BHIE_TXVECSTATUS_SEQNUM_BMSK);

> -	dev_dbg(dev, "Starting AMSS download via BHIe. Sequence ID:%u\n",

> +	dev_dbg(dev, "Starting image download via BHIe. Sequence ID: %u\n",

>  		sequence_id);

>  	mhi_write_reg(mhi_cntrl, base, BHIE_TXVECADDR_HIGH_OFFS,

>  		      upper_32_bits(mhi_buf->dma_addr));

> @@ -218,7 +218,7 @@ static int mhi_fw_load_amss(struct mhi_controller *mhi_cntrl,

>  	return (!ret) ? -ETIMEDOUT : 0;

>  }

>  

> -static int mhi_fw_load_sbl(struct mhi_controller *mhi_cntrl,

> +static int mhi_fw_load_bhi(struct mhi_controller *mhi_cntrl,

>  			   dma_addr_t dma_addr,

>  			   size_t size)

>  {

> @@ -245,7 +245,7 @@ static int mhi_fw_load_sbl(struct mhi_controller *mhi_cntrl,

>  	}

>  

>  	session_id = MHI_RANDOM_U32_NONZERO(BHI_TXDB_SEQNUM_BMSK);

> -	dev_dbg(dev, "Starting SBL download via BHI. Session ID:%u\n",

> +	dev_dbg(dev, "Starting image download via BHI. Session ID: %u\n",

>  		session_id);

>  	mhi_write_reg(mhi_cntrl, base, BHI_STATUS, 0);

>  	mhi_write_reg(mhi_cntrl, base, BHI_IMGADDR_HIGH,

> @@ -446,9 +446,9 @@ void mhi_fw_load_handler(struct mhi_controller *mhi_cntrl)

>  		return;

>  	}

>  

> -	/* Download SBL image */

> +	/* Download image using BHI */

>  	memcpy(buf, firmware->data, size);

> -	ret = mhi_fw_load_sbl(mhi_cntrl, dma_addr, size);

> +	ret = mhi_fw_load_bhi(mhi_cntrl, dma_addr, size);

>  	mhi_free_coherent(mhi_cntrl, size, buf, dma_addr);

>  

>  	if (!mhi_cntrl->fbc_download || ret || mhi_cntrl->ee == MHI_EE_EDL)

> @@ -456,7 +456,7 @@ void mhi_fw_load_handler(struct mhi_controller *mhi_cntrl)

>  

>  	/* Error or in EDL mode, we're done */

>  	if (ret) {

> -		dev_err(dev, "MHI did not load SBL, ret:%d\n", ret);

> +		dev_err(dev, "MHI did not load image over BHI, ret: %d\n", ret);

>  		return;

>  	}

>  

> @@ -506,11 +506,12 @@ void mhi_fw_load_handler(struct mhi_controller *mhi_cntrl)

>  

>  	/* Start full firmware image download */

>  	image_info = mhi_cntrl->fbc_image;

> -	ret = mhi_fw_load_amss(mhi_cntrl,

> +	ret = mhi_fw_load_bhie(mhi_cntrl,

>  			       /* Vector table is the last entry */

>  			       &image_info->mhi_buf[image_info->entries - 1]);

>  	if (ret)

> -		dev_err(dev, "MHI did not load AMSS, ret:%d\n", ret);

> +		dev_err(dev, "MHI did not load image over BHIe, ret: %d\n",

> +			ret);

>  

>  	release_firmware(firmware);

>  

> -- 

> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,

> a Linux Foundation Collaborative Project

>
Manivannan Sadhasivam Oct. 30, 2020, 1:52 p.m. UTC | #2
On Thu, Oct 29, 2020 at 09:10:49PM -0700, Bhaumik Bhatt wrote:
> In some cases, the entry of device to RDDM execution environment

> can occur after a significant amount of time has elapsed and a

> SYS_ERROR state change event has already arrived.


I don't quite understand this statement. Can you elaborate? This doesn't
relate to what the patch is doing.

> This can result

> in scenarios where MHI controller and client drivers are unaware

> of the error state of the device. Remove the check for rddm_image

> when processing the SYS_ERROR state change as it is present in

> mhi_pm_sys_err_handler() already and prevent further activity

> until the expected RDDM execution environment change occurs or

> the controller driver decides further action.

> 

> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>

> ---

>  drivers/bus/mhi/core/main.c | 12 ++++--------

>  1 file changed, 4 insertions(+), 8 deletions(-)

> 

> diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c

> index 2cff5dd..1f32d67 100644

> --- a/drivers/bus/mhi/core/main.c

> +++ b/drivers/bus/mhi/core/main.c

> @@ -733,19 +733,15 @@ int mhi_process_ctrl_ev_ring(struct mhi_controller *mhi_cntrl,

>  				break;

>  			case MHI_STATE_SYS_ERR:

>  			{

> -				enum mhi_pm_state new_state;

> -

> -				/* skip SYS_ERROR handling if RDDM supported */

> -				if (mhi_cntrl->ee == MHI_EE_RDDM ||

> -				    mhi_cntrl->rddm_image)

> -					break;

> +				enum mhi_pm_state state = MHI_PM_STATE_MAX;

>  

>  				dev_dbg(dev, "System error detected\n");

>  				write_lock_irq(&mhi_cntrl->pm_lock);

> -				new_state = mhi_tryset_pm_state(mhi_cntrl,

> +				if (mhi_cntrl->ee != MHI_EE_RDDM)


But you are still checking for RDDM EE?

Please explain why you want to skip RDDM check.

Thanks,
Mani

> +					state = mhi_tryset_pm_state(mhi_cntrl,

>  							MHI_PM_SYS_ERR_DETECT);

>  				write_unlock_irq(&mhi_cntrl->pm_lock);

> -				if (new_state == MHI_PM_SYS_ERR_DETECT)

> +				if (state == MHI_PM_SYS_ERR_DETECT)

>  					mhi_pm_sys_err_handler(mhi_cntrl);

>  				break;

>  			}

> -- 

> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,

> a Linux Foundation Collaborative Project

>
Manivannan Sadhasivam Oct. 30, 2020, 1:56 p.m. UTC | #3
On Thu, Oct 29, 2020 at 09:10:50PM -0700, Bhaumik Bhatt wrote:
> If an mhi_power_down() is initiated after the device has entered

> RDDM and a status callback was provided for it, it is possible

> that another BHI interrupt fires while waiting for the MHI

> RESET to be cleared. If that happens, MHI host would have moved

> a "disabled" execution environment and the check to allow sending

> an RDDM status callback will pass when it is should not. Add a

> check to see if MHI is in an active state before proceeding.

> 

> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>


Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>


Thanks,
Mani

> ---

>  drivers/bus/mhi/core/main.c | 4 ++++

>  1 file changed, 4 insertions(+)

> 

> diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c

> index 1f32d67..172b48b 100644

> --- a/drivers/bus/mhi/core/main.c

> +++ b/drivers/bus/mhi/core/main.c

> @@ -399,6 +399,10 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, void *priv)

>  

>  	 /* If device supports RDDM don't bother processing SYS error */

>  	if (mhi_cntrl->rddm_image) {

> +		/* host may be performing a device power down already */

> +		if (!mhi_is_active(mhi_cntrl))

> +			goto exit_intvec;

> +

>  		if (mhi_cntrl->ee == MHI_EE_RDDM && mhi_cntrl->ee != ee) {

>  			mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_EE_RDDM);

>  			wake_up_all(&mhi_cntrl->state_event);

> -- 

> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,

> a Linux Foundation Collaborative Project

>
Manivannan Sadhasivam Oct. 30, 2020, 2:02 p.m. UTC | #4
On Thu, Oct 29, 2020 at 09:10:54PM -0700, Bhaumik Bhatt wrote:
> Current design allows a controller to register with MHI successfully

> without the need to have any IRQs available for use. If no IRQs are

> available, power up requests to MHI can fail after a successful

> registration with MHI. Improve the design by checking for the number

> of IRQs available sooner within the mhi_regsiter_controller() API as

> it is required to be specified by the controller.

> 

> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>


Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>


Thanks,
Mani

> ---

>  drivers/bus/mhi/core/init.c | 2 +-

>  drivers/bus/mhi/core/pm.c   | 3 ---

>  2 files changed, 1 insertion(+), 4 deletions(-)

> 

> diff --git a/drivers/bus/mhi/core/init.c b/drivers/bus/mhi/core/init.c

> index 23b6dd6..5dd9e39 100644

> --- a/drivers/bus/mhi/core/init.c

> +++ b/drivers/bus/mhi/core/init.c

> @@ -858,7 +858,7 @@ int mhi_register_controller(struct mhi_controller *mhi_cntrl,

>  

>  	if (!mhi_cntrl->runtime_get || !mhi_cntrl->runtime_put ||

>  	    !mhi_cntrl->status_cb || !mhi_cntrl->read_reg ||

> -	    !mhi_cntrl->write_reg)

> +	    !mhi_cntrl->write_reg || !mhi_cntrl->nr_irqs)

>  		return -EINVAL;

>  

>  	ret = parse_config(mhi_cntrl, config);

> diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c

> index 06adea2..1d04e401 100644

> --- a/drivers/bus/mhi/core/pm.c

> +++ b/drivers/bus/mhi/core/pm.c

> @@ -926,9 +926,6 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)

>  

>  	dev_info(dev, "Requested to power ON\n");

>  

> -	if (mhi_cntrl->nr_irqs < 1)

> -		return -EINVAL;

> -

>  	/* Supply default wake routines if not provided by controller driver */

>  	if (!mhi_cntrl->wake_get || !mhi_cntrl->wake_put ||

>  	    !mhi_cntrl->wake_toggle) {

> -- 

> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,

> a Linux Foundation Collaborative Project

>
Manivannan Sadhasivam Oct. 30, 2020, 2:06 p.m. UTC | #5
On Thu, Oct 29, 2020 at 09:10:55PM -0700, Bhaumik Bhatt wrote:
> Currently, there exist a set of if...else statements in the

> mhi_pm_disable_transition() function which make handling system

> error and disable transitions differently complex. To make that

> cleaner and facilitate differences in behavior, separate these

> two transitions for MHI host.

> 


And this results in a lot of duplicated code :/

Thanks,
Mani

> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>

> ---

>  drivers/bus/mhi/core/pm.c | 159 +++++++++++++++++++++++++++++++++++++++-------

>  1 file changed, 137 insertions(+), 22 deletions(-)

> 

> diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c

> index 1d04e401..347ae7d 100644

> --- a/drivers/bus/mhi/core/pm.c

> +++ b/drivers/bus/mhi/core/pm.c

> @@ -444,7 +444,7 @@ static int mhi_pm_mission_mode_transition(struct mhi_controller *mhi_cntrl)

>  	return ret;

>  }

>  

> -/* Handle SYS_ERR and Shutdown transitions */

> +/* Handle shutdown transitions */

>  static void mhi_pm_disable_transition(struct mhi_controller *mhi_cntrl,

>  				      enum mhi_pm_state transition_state)

>  {

> @@ -460,10 +460,6 @@ static void mhi_pm_disable_transition(struct mhi_controller *mhi_cntrl,

>  		to_mhi_pm_state_str(mhi_cntrl->pm_state),

>  		to_mhi_pm_state_str(transition_state));

>  

> -	/* We must notify MHI control driver so it can clean up first */

> -	if (transition_state == MHI_PM_SYS_ERR_PROCESS)

> -		mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_SYS_ERROR);

> -

>  	mutex_lock(&mhi_cntrl->pm_mutex);

>  	write_lock_irq(&mhi_cntrl->pm_lock);

>  	prev_state = mhi_cntrl->pm_state;

> @@ -502,11 +498,8 @@ static void mhi_pm_disable_transition(struct mhi_controller *mhi_cntrl,

>  							    MHICTRL_RESET_SHIFT,

>  							    &in_reset) ||

>  					!in_reset, timeout);

> -		if ((!ret || in_reset) && cur_state == MHI_PM_SYS_ERR_PROCESS) {

> +		if (!ret || in_reset)

>  			dev_err(dev, "Device failed to exit MHI Reset state\n");

> -			mutex_unlock(&mhi_cntrl->pm_mutex);

> -			return;

> -		}

>  

>  		/*

>  		 * Device will clear BHI_INTVEC as a part of RESET processing,

> @@ -566,19 +559,142 @@ static void mhi_pm_disable_transition(struct mhi_controller *mhi_cntrl,

>  		er_ctxt->wp = er_ctxt->rbase;

>  	}

>  

> -	if (cur_state == MHI_PM_SYS_ERR_PROCESS) {

> -		mhi_ready_state_transition(mhi_cntrl);

> -	} else {

> -		/* Move to disable state */

> -		write_lock_irq(&mhi_cntrl->pm_lock);

> -		cur_state = mhi_tryset_pm_state(mhi_cntrl, MHI_PM_DISABLE);

> -		write_unlock_irq(&mhi_cntrl->pm_lock);

> -		if (unlikely(cur_state != MHI_PM_DISABLE))

> -			dev_err(dev, "Error moving from PM state: %s to: %s\n",

> -				to_mhi_pm_state_str(cur_state),

> -				to_mhi_pm_state_str(MHI_PM_DISABLE));

> +	/* Move to disable state */

> +	write_lock_irq(&mhi_cntrl->pm_lock);

> +	cur_state = mhi_tryset_pm_state(mhi_cntrl, MHI_PM_DISABLE);

> +	write_unlock_irq(&mhi_cntrl->pm_lock);

> +	if (unlikely(cur_state != MHI_PM_DISABLE))

> +		dev_err(dev, "Error moving from PM state: %s to: %s\n",

> +			to_mhi_pm_state_str(cur_state),

> +			to_mhi_pm_state_str(MHI_PM_DISABLE));

> +

> +	dev_dbg(dev, "Exiting with PM state: %s, MHI state: %s\n",

> +		to_mhi_pm_state_str(mhi_cntrl->pm_state),

> +		TO_MHI_STATE_STR(mhi_cntrl->dev_state));

> +

> +	mutex_unlock(&mhi_cntrl->pm_mutex);

> +}

> +

> +/* Handle system error transitions */

> +static void mhi_pm_sys_error_transition(struct mhi_controller *mhi_cntrl)

> +{

> +	enum mhi_pm_state cur_state, prev_state;

> +	struct mhi_event *mhi_event;

> +	struct mhi_cmd_ctxt *cmd_ctxt;

> +	struct mhi_cmd *mhi_cmd;

> +	struct mhi_event_ctxt *er_ctxt;

> +	struct device *dev = &mhi_cntrl->mhi_dev->dev;

> +	int ret, i;

> +

> +	dev_dbg(dev, "Transitioning from PM state: %s to: %s\n",

> +		to_mhi_pm_state_str(mhi_cntrl->pm_state),

> +		to_mhi_pm_state_str(MHI_PM_SYS_ERR_PROCESS));

> +

> +	/* We must notify MHI control driver so it can clean up first */

> +	mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_SYS_ERROR);

> +

> +	mutex_lock(&mhi_cntrl->pm_mutex);

> +	write_lock_irq(&mhi_cntrl->pm_lock);

> +	prev_state = mhi_cntrl->pm_state;

> +	cur_state = mhi_tryset_pm_state(mhi_cntrl, MHI_PM_SYS_ERR_PROCESS);

> +	write_unlock_irq(&mhi_cntrl->pm_lock);

> +

> +	if (cur_state != MHI_PM_SYS_ERR_PROCESS) {

> +		dev_err(dev, "Failed to transition from PM state: %s to: %s\n",

> +			to_mhi_pm_state_str(cur_state),

> +			to_mhi_pm_state_str(MHI_PM_SYS_ERR_PROCESS));

> +		goto exit_sys_error_transition;

> +	}

> +

> +	mhi_cntrl->ee = MHI_EE_DISABLE_TRANSITION;

> +	mhi_cntrl->dev_state = MHI_STATE_RESET;

> +

> +	/* Wake up threads waiting for state transition */

> +	wake_up_all(&mhi_cntrl->state_event);

> +

> +	/* Trigger MHI RESET so that the device will not access host memory */

> +	if (MHI_REG_ACCESS_VALID(prev_state)) {

> +		u32 in_reset = -1;

> +		unsigned long timeout = msecs_to_jiffies(mhi_cntrl->timeout_ms);

> +

> +		dev_dbg(dev, "Triggering MHI Reset in device\n");

> +		mhi_set_mhi_state(mhi_cntrl, MHI_STATE_RESET);

> +

> +		/* Wait for the reset bit to be cleared by the device */

> +		ret = wait_event_timeout(mhi_cntrl->state_event,

> +					 mhi_read_reg_field(mhi_cntrl,

> +							    mhi_cntrl->regs,

> +							    MHICTRL,

> +							    MHICTRL_RESET_MASK,

> +							    MHICTRL_RESET_SHIFT,

> +							    &in_reset) ||

> +					!in_reset, timeout);

> +		if (!ret || in_reset) {

> +			dev_err(dev, "Device failed to exit MHI Reset state\n");

> +			goto exit_sys_error_transition;

> +		}

> +

> +		/*

> +		 * Device will clear BHI_INTVEC as a part of RESET processing,

> +		 * hence re-program it

> +		 */

> +		mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0);

> +	}

> +

> +	dev_dbg(dev,

> +		"Waiting for all pending event ring processing to complete\n");

> +	mhi_event = mhi_cntrl->mhi_event;

> +	for (i = 0; i < mhi_cntrl->total_ev_rings; i++, mhi_event++) {

> +		if (mhi_event->offload_ev)

> +			continue;

> +		tasklet_kill(&mhi_event->task);

>  	}

>  

> +	/* Release lock and wait for all pending threads to complete */

> +	mutex_unlock(&mhi_cntrl->pm_mutex);

> +	dev_dbg(dev, "Waiting for all pending threads to complete\n");

> +	wake_up_all(&mhi_cntrl->state_event);

> +

> +	dev_dbg(dev, "Reset all active channels and remove MHI devices\n");

> +	device_for_each_child(mhi_cntrl->cntrl_dev, NULL, mhi_destroy_device);

> +

> +	mutex_lock(&mhi_cntrl->pm_mutex);

> +

> +	WARN_ON(atomic_read(&mhi_cntrl->dev_wake));

> +	WARN_ON(atomic_read(&mhi_cntrl->pending_pkts));

> +

> +	/* Reset the ev rings and cmd rings */

> +	dev_dbg(dev, "Resetting EV CTXT and CMD CTXT\n");

> +	mhi_cmd = mhi_cntrl->mhi_cmd;

> +	cmd_ctxt = mhi_cntrl->mhi_ctxt->cmd_ctxt;

> +	for (i = 0; i < NR_OF_CMD_RINGS; i++, mhi_cmd++, cmd_ctxt++) {

> +		struct mhi_ring *ring = &mhi_cmd->ring;

> +

> +		ring->rp = ring->base;

> +		ring->wp = ring->base;

> +		cmd_ctxt->rp = cmd_ctxt->rbase;

> +		cmd_ctxt->wp = cmd_ctxt->rbase;

> +	}

> +

> +	mhi_event = mhi_cntrl->mhi_event;

> +	er_ctxt = mhi_cntrl->mhi_ctxt->er_ctxt;

> +	for (i = 0; i < mhi_cntrl->total_ev_rings; i++, er_ctxt++,

> +	     mhi_event++) {

> +		struct mhi_ring *ring = &mhi_event->ring;

> +

> +		/* Skip offload events */

> +		if (mhi_event->offload_ev)

> +			continue;

> +

> +		ring->rp = ring->base;

> +		ring->wp = ring->base;

> +		er_ctxt->rp = er_ctxt->rbase;

> +		er_ctxt->wp = er_ctxt->rbase;

> +	}

> +

> +	mhi_ready_state_transition(mhi_cntrl);

> +

> +exit_sys_error_transition:

>  	dev_dbg(dev, "Exiting with PM state: %s, MHI state: %s\n",

>  		to_mhi_pm_state_str(mhi_cntrl->pm_state),

>  		TO_MHI_STATE_STR(mhi_cntrl->dev_state));

> @@ -666,8 +782,7 @@ void mhi_pm_st_worker(struct work_struct *work)

>  			mhi_ready_state_transition(mhi_cntrl);

>  			break;

>  		case DEV_ST_TRANSITION_SYS_ERR:

> -			mhi_pm_disable_transition

> -				(mhi_cntrl, MHI_PM_SYS_ERR_PROCESS);

> +			mhi_pm_sys_error_transition(mhi_cntrl);

>  			break;

>  		case DEV_ST_TRANSITION_DISABLE:

>  			mhi_pm_disable_transition

> -- 

> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,

> a Linux Foundation Collaborative Project

>
Manivannan Sadhasivam Oct. 30, 2020, 2:11 p.m. UTC | #6
On Thu, Oct 29, 2020 at 09:10:57PM -0700, Bhaumik Bhatt wrote:
> While powering down, the device may or may not acknowledge an MHI

> RESET issued by host for a graceful shutdown scenario and end up

> sending an incoming data packet after tasklets have been killed.

> If a rogue device sends this interrupt for a data transfer event

> ring update, it can result in a tasklet getting scheduled while a

> clean up is ongoing or has completed and cause access to freed

> memory leading to a NULL pointer exception. Remove the interrupt

> handlers for MHI event rings early on to avoid this scenario.

> 

> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>


Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>


Thanks,
Mani

> ---

>  drivers/bus/mhi/core/pm.c | 3 ++-

>  1 file changed, 2 insertions(+), 1 deletion(-)

> 

> diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c

> index ffbf6f5..a671f58 100644

> --- a/drivers/bus/mhi/core/pm.c

> +++ b/drivers/bus/mhi/core/pm.c

> @@ -494,6 +494,7 @@ static void mhi_pm_disable_transition(struct mhi_controller *mhi_cntrl)

>  	for (i = 0; i < mhi_cntrl->total_ev_rings; i++, mhi_event++) {

>  		if (mhi_event->offload_ev)

>  			continue;

> +		free_irq(mhi_cntrl->irq[mhi_event->irq], mhi_event);

>  		tasklet_kill(&mhi_event->task);

>  	}

>  

> @@ -1164,7 +1165,7 @@ void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful)

>  	/* Wait for shutdown to complete */

>  	flush_work(&mhi_cntrl->st_worker);

>  

> -	mhi_deinit_free_irq(mhi_cntrl);

> +	free_irq(mhi_cntrl->irq[0], mhi_cntrl);

>  

>  	if (!mhi_cntrl->pre_init) {

>  		/* Free all allocated resources */

> -- 

> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,

> a Linux Foundation Collaborative Project

>
Bhaumik Bhatt Oct. 30, 2020, 7:29 p.m. UTC | #7
Hi Mani,

On 2020-10-30 06:52, Manivannan Sadhasivam wrote:
> On Thu, Oct 29, 2020 at 09:10:49PM -0700, Bhaumik Bhatt wrote:

>> In some cases, the entry of device to RDDM execution environment

>> can occur after a significant amount of time has elapsed and a

>> SYS_ERROR state change event has already arrived.

> 

> I don't quite understand this statement. Can you elaborate? This 

> doesn't

> relate to what the patch is doing.

> 

So the mhi_intvec_threaded_handler() (BHI) MSI that fires to switch the 
EE
to RDDM may come much later than the SYS_ERROR state change event from 
the
control event ring. We currently, do not move to MHI_PM_SYS_ERR_DETECT
state if RDDM is supported i.e. mhi_cntrl->rddm_image is set. However, 
it
means that we will remain in an "active" MHI PM state for the duration 
of
time until RDDM EE (BHI) MSI comes in. We have seen it take 5 seconds in
some bad cases.
>> This can result

>> in scenarios where MHI controller and client drivers are unaware

>> of the error state of the device. Remove the check for rddm_image

>> when processing the SYS_ERROR state change as it is present in

>> mhi_pm_sys_err_handler() already and prevent further activity

>> until the expected RDDM execution environment change occurs or

>> the controller driver decides further action.

>> 

>> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>

>> ---

>>  drivers/bus/mhi/core/main.c | 12 ++++--------

>>  1 file changed, 4 insertions(+), 8 deletions(-)

>> 

>> diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c

>> index 2cff5dd..1f32d67 100644

>> --- a/drivers/bus/mhi/core/main.c

>> +++ b/drivers/bus/mhi/core/main.c

>> @@ -733,19 +733,15 @@ int mhi_process_ctrl_ev_ring(struct 

>> mhi_controller *mhi_cntrl,

>>  				break;

>>  			case MHI_STATE_SYS_ERR:

>>  			{

>> -				enum mhi_pm_state new_state;

>> -

>> -				/* skip SYS_ERROR handling if RDDM supported */

>> -				if (mhi_cntrl->ee == MHI_EE_RDDM ||

>> -				    mhi_cntrl->rddm_image)

>> -					break;

>> +				enum mhi_pm_state state = MHI_PM_STATE_MAX;

>> 

>>  				dev_dbg(dev, "System error detected\n");

>>  				write_lock_irq(&mhi_cntrl->pm_lock);

>> -				new_state = mhi_tryset_pm_state(mhi_cntrl,

>> +				if (mhi_cntrl->ee != MHI_EE_RDDM)

> 

> But you are still checking for RDDM EE?

> 

> Please explain why you want to skip RDDM check.

> 

> Thanks,

> Mani

> 

Yes, the point is to only remove the mhi_cntrl->rddm_image check but 
still
retain the "has EE moved to become RDDM" check. This allows us to avoid 
any
extra processing of moving states to MHI_PM_SYS_ERR_DETECT state if it 
may
be unnecessary (EE already changed to RDDM). The mhi_cntrl->rddm_image 
is
also present in mhi_pm_sys_err_handler(mhi_cntrl) function so it is not
needed here.
>> +					state = mhi_tryset_pm_state(mhi_cntrl,

>>  							MHI_PM_SYS_ERR_DETECT);

>>  				write_unlock_irq(&mhi_cntrl->pm_lock);

>> -				if (new_state == MHI_PM_SYS_ERR_DETECT)

>> +				if (state == MHI_PM_SYS_ERR_DETECT)

>>  					mhi_pm_sys_err_handler(mhi_cntrl);

>>  				break;

>>  			}

>> --

>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 

>> Forum,

>> a Linux Foundation Collaborative Project

>> 


This is why I mention the word RDDM "capability". If controller supports 
RDDM
is not enough to skip the move to MHI_PM_SYS_ERR_DETECT state as it is 
safer
to move and stop client drivers from pushing data.

Thanks,
Bhaumik
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
a Linux Foundation Collaborative Project
Bhaumik Bhatt Oct. 30, 2020, 7:34 p.m. UTC | #8
Hi Mani,

On 2020-10-30 07:06, Manivannan Sadhasivam wrote:
> On Thu, Oct 29, 2020 at 09:10:55PM -0700, Bhaumik Bhatt wrote:

>> Currently, there exist a set of if...else statements in the

>> mhi_pm_disable_transition() function which make handling system

>> error and disable transitions differently complex. To make that

>> cleaner and facilitate differences in behavior, separate these

>> two transitions for MHI host.

>> 

> 

> And this results in a lot of duplicated code :/

> 

> Thanks,

> Mani

> 


I knew this was coming. Mainly, we can avoid adding confusing if...else
statements that plague the current mhi_pm_disable_transition() function 
and in
return for some duplicate code, we can make handling separate use cases 
easier
as they could pop-up anytime in the future.

>> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>

>> ---

>>  drivers/bus/mhi/core/pm.c | 159 

>> +++++++++++++++++++++++++++++++++++++++-------

>>  1 file changed, 137 insertions(+), 22 deletions(-)

>> 

>> diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c

>> index 1d04e401..347ae7d 100644

>> --- a/drivers/bus/mhi/core/pm.c

>> +++ b/drivers/bus/mhi/core/pm.c

>> @@ -444,7 +444,7 @@ static int mhi_pm_mission_mode_transition(struct 

>> mhi_controller *mhi_cntrl)

>>  	return ret;

>>  }

>> 

>> -/* Handle SYS_ERR and Shutdown transitions */

>> +/* Handle shutdown transitions */

>>  static void mhi_pm_disable_transition(struct mhi_controller 

>> *mhi_cntrl,

>>  				      enum mhi_pm_state transition_state)

>>  {

>> @@ -460,10 +460,6 @@ static void mhi_pm_disable_transition(struct 

>> mhi_controller *mhi_cntrl,

>>  		to_mhi_pm_state_str(mhi_cntrl->pm_state),

>>  		to_mhi_pm_state_str(transition_state));

>> 

>> -	/* We must notify MHI control driver so it can clean up first */

>> -	if (transition_state == MHI_PM_SYS_ERR_PROCESS)

>> -		mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_SYS_ERROR);

>> -

>>  	mutex_lock(&mhi_cntrl->pm_mutex);

>>  	write_lock_irq(&mhi_cntrl->pm_lock);

>>  	prev_state = mhi_cntrl->pm_state;

>> @@ -502,11 +498,8 @@ static void mhi_pm_disable_transition(struct 

>> mhi_controller *mhi_cntrl,

>>  							    MHICTRL_RESET_SHIFT,

>>  							    &in_reset) ||

>>  					!in_reset, timeout);

>> -		if ((!ret || in_reset) && cur_state == MHI_PM_SYS_ERR_PROCESS) {

>> +		if (!ret || in_reset)

>>  			dev_err(dev, "Device failed to exit MHI Reset state\n");

>> -			mutex_unlock(&mhi_cntrl->pm_mutex);

>> -			return;

>> -		}

>> 

>>  		/*

>>  		 * Device will clear BHI_INTVEC as a part of RESET processing,

>> @@ -566,19 +559,142 @@ static void mhi_pm_disable_transition(struct 

>> mhi_controller *mhi_cntrl,

>>  		er_ctxt->wp = er_ctxt->rbase;

>>  	}

>> 

>> -	if (cur_state == MHI_PM_SYS_ERR_PROCESS) {

>> -		mhi_ready_state_transition(mhi_cntrl);

>> -	} else {

>> -		/* Move to disable state */

>> -		write_lock_irq(&mhi_cntrl->pm_lock);

>> -		cur_state = mhi_tryset_pm_state(mhi_cntrl, MHI_PM_DISABLE);

>> -		write_unlock_irq(&mhi_cntrl->pm_lock);

>> -		if (unlikely(cur_state != MHI_PM_DISABLE))

>> -			dev_err(dev, "Error moving from PM state: %s to: %s\n",

>> -				to_mhi_pm_state_str(cur_state),

>> -				to_mhi_pm_state_str(MHI_PM_DISABLE));

>> +	/* Move to disable state */

>> +	write_lock_irq(&mhi_cntrl->pm_lock);

>> +	cur_state = mhi_tryset_pm_state(mhi_cntrl, MHI_PM_DISABLE);

>> +	write_unlock_irq(&mhi_cntrl->pm_lock);

>> +	if (unlikely(cur_state != MHI_PM_DISABLE))

>> +		dev_err(dev, "Error moving from PM state: %s to: %s\n",

>> +			to_mhi_pm_state_str(cur_state),

>> +			to_mhi_pm_state_str(MHI_PM_DISABLE));

>> +

>> +	dev_dbg(dev, "Exiting with PM state: %s, MHI state: %s\n",

>> +		to_mhi_pm_state_str(mhi_cntrl->pm_state),

>> +		TO_MHI_STATE_STR(mhi_cntrl->dev_state));

>> +

>> +	mutex_unlock(&mhi_cntrl->pm_mutex);

>> +}

>> +

>> +/* Handle system error transitions */

>> +static void mhi_pm_sys_error_transition(struct mhi_controller 

>> *mhi_cntrl)

>> +{

>> +	enum mhi_pm_state cur_state, prev_state;

>> +	struct mhi_event *mhi_event;

>> +	struct mhi_cmd_ctxt *cmd_ctxt;

>> +	struct mhi_cmd *mhi_cmd;

>> +	struct mhi_event_ctxt *er_ctxt;

>> +	struct device *dev = &mhi_cntrl->mhi_dev->dev;

>> +	int ret, i;

>> +

>> +	dev_dbg(dev, "Transitioning from PM state: %s to: %s\n",

>> +		to_mhi_pm_state_str(mhi_cntrl->pm_state),

>> +		to_mhi_pm_state_str(MHI_PM_SYS_ERR_PROCESS));

>> +

>> +	/* We must notify MHI control driver so it can clean up first */

>> +	mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_SYS_ERROR);

>> +

>> +	mutex_lock(&mhi_cntrl->pm_mutex);

>> +	write_lock_irq(&mhi_cntrl->pm_lock);

>> +	prev_state = mhi_cntrl->pm_state;

>> +	cur_state = mhi_tryset_pm_state(mhi_cntrl, MHI_PM_SYS_ERR_PROCESS);

>> +	write_unlock_irq(&mhi_cntrl->pm_lock);

>> +

>> +	if (cur_state != MHI_PM_SYS_ERR_PROCESS) {

>> +		dev_err(dev, "Failed to transition from PM state: %s to: %s\n",

>> +			to_mhi_pm_state_str(cur_state),

>> +			to_mhi_pm_state_str(MHI_PM_SYS_ERR_PROCESS));

>> +		goto exit_sys_error_transition;

>> +	}

>> +

>> +	mhi_cntrl->ee = MHI_EE_DISABLE_TRANSITION;

>> +	mhi_cntrl->dev_state = MHI_STATE_RESET;

>> +

>> +	/* Wake up threads waiting for state transition */

>> +	wake_up_all(&mhi_cntrl->state_event);

>> +

>> +	/* Trigger MHI RESET so that the device will not access host memory 

>> */

>> +	if (MHI_REG_ACCESS_VALID(prev_state)) {

>> +		u32 in_reset = -1;

>> +		unsigned long timeout = msecs_to_jiffies(mhi_cntrl->timeout_ms);

>> +

>> +		dev_dbg(dev, "Triggering MHI Reset in device\n");

>> +		mhi_set_mhi_state(mhi_cntrl, MHI_STATE_RESET);

>> +

>> +		/* Wait for the reset bit to be cleared by the device */

>> +		ret = wait_event_timeout(mhi_cntrl->state_event,

>> +					 mhi_read_reg_field(mhi_cntrl,

>> +							    mhi_cntrl->regs,

>> +							    MHICTRL,

>> +							    MHICTRL_RESET_MASK,

>> +							    MHICTRL_RESET_SHIFT,

>> +							    &in_reset) ||

>> +					!in_reset, timeout);

>> +		if (!ret || in_reset) {

>> +			dev_err(dev, "Device failed to exit MHI Reset state\n");

>> +			goto exit_sys_error_transition;

>> +		}

>> +

>> +		/*

>> +		 * Device will clear BHI_INTVEC as a part of RESET processing,

>> +		 * hence re-program it

>> +		 */

>> +		mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0);

>> +	}

>> +

>> +	dev_dbg(dev,

>> +		"Waiting for all pending event ring processing to complete\n");

>> +	mhi_event = mhi_cntrl->mhi_event;

>> +	for (i = 0; i < mhi_cntrl->total_ev_rings; i++, mhi_event++) {

>> +		if (mhi_event->offload_ev)

>> +			continue;

>> +		tasklet_kill(&mhi_event->task);

>>  	}

>> 

>> +	/* Release lock and wait for all pending threads to complete */

>> +	mutex_unlock(&mhi_cntrl->pm_mutex);

>> +	dev_dbg(dev, "Waiting for all pending threads to complete\n");

>> +	wake_up_all(&mhi_cntrl->state_event);

>> +

>> +	dev_dbg(dev, "Reset all active channels and remove MHI devices\n");

>> +	device_for_each_child(mhi_cntrl->cntrl_dev, NULL, 

>> mhi_destroy_device);

>> +

>> +	mutex_lock(&mhi_cntrl->pm_mutex);

>> +

>> +	WARN_ON(atomic_read(&mhi_cntrl->dev_wake));

>> +	WARN_ON(atomic_read(&mhi_cntrl->pending_pkts));

>> +

>> +	/* Reset the ev rings and cmd rings */

>> +	dev_dbg(dev, "Resetting EV CTXT and CMD CTXT\n");

>> +	mhi_cmd = mhi_cntrl->mhi_cmd;

>> +	cmd_ctxt = mhi_cntrl->mhi_ctxt->cmd_ctxt;

>> +	for (i = 0; i < NR_OF_CMD_RINGS; i++, mhi_cmd++, cmd_ctxt++) {

>> +		struct mhi_ring *ring = &mhi_cmd->ring;

>> +

>> +		ring->rp = ring->base;

>> +		ring->wp = ring->base;

>> +		cmd_ctxt->rp = cmd_ctxt->rbase;

>> +		cmd_ctxt->wp = cmd_ctxt->rbase;

>> +	}

>> +

>> +	mhi_event = mhi_cntrl->mhi_event;

>> +	er_ctxt = mhi_cntrl->mhi_ctxt->er_ctxt;

>> +	for (i = 0; i < mhi_cntrl->total_ev_rings; i++, er_ctxt++,

>> +	     mhi_event++) {

>> +		struct mhi_ring *ring = &mhi_event->ring;

>> +

>> +		/* Skip offload events */

>> +		if (mhi_event->offload_ev)

>> +			continue;

>> +

>> +		ring->rp = ring->base;

>> +		ring->wp = ring->base;

>> +		er_ctxt->rp = er_ctxt->rbase;

>> +		er_ctxt->wp = er_ctxt->rbase;

>> +	}

>> +

>> +	mhi_ready_state_transition(mhi_cntrl);

>> +

>> +exit_sys_error_transition:

>>  	dev_dbg(dev, "Exiting with PM state: %s, MHI state: %s\n",

>>  		to_mhi_pm_state_str(mhi_cntrl->pm_state),

>>  		TO_MHI_STATE_STR(mhi_cntrl->dev_state));

>> @@ -666,8 +782,7 @@ void mhi_pm_st_worker(struct work_struct *work)

>>  			mhi_ready_state_transition(mhi_cntrl);

>>  			break;

>>  		case DEV_ST_TRANSITION_SYS_ERR:

>> -			mhi_pm_disable_transition

>> -				(mhi_cntrl, MHI_PM_SYS_ERR_PROCESS);

>> +			mhi_pm_sys_error_transition(mhi_cntrl);

>>  			break;

>>  		case DEV_ST_TRANSITION_DISABLE:

>>  			mhi_pm_disable_transition

>> --

>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 

>> Forum,

>> a Linux Foundation Collaborative Project

>> 


Thanks,
Bhaumik
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
a Linux Foundation Collaborative Project
Manivannan Sadhasivam Oct. 31, 2020, 6:54 a.m. UTC | #9
Hi Bhaumik, 

On 31 October 2020 1:04:07 AM IST, Bhaumik Bhatt <bbhatt@codeaurora.org> wrote:
>Hi Mani,

>

>On 2020-10-30 07:06, Manivannan Sadhasivam wrote:

>> On Thu, Oct 29, 2020 at 09:10:55PM -0700, Bhaumik Bhatt wrote:

>>> Currently, there exist a set of if...else statements in the

>>> mhi_pm_disable_transition() function which make handling system

>>> error and disable transitions differently complex. To make that

>>> cleaner and facilitate differences in behavior, separate these

>>> two transitions for MHI host.

>>> 

>> 

>> And this results in a lot of duplicated code :/

>> 

>> Thanks,

>> Mani

>> 

>

>I knew this was coming. Mainly, we can avoid adding confusing if...else

>statements that plague the current mhi_pm_disable_transition() function

>

>and in

>return for some duplicate code, we can make handling separate use cases

>

>easier

>as they could pop-up anytime in the future.

>


If that happens then do it but now, please no. 

Thanks, 
Mani

>>> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>

>>> ---

>>>  drivers/bus/mhi/core/pm.c | 159 

>>> +++++++++++++++++++++++++++++++++++++++-------

>>>  1 file changed, 137 insertions(+), 22 deletions(-)

>>> 

>>> diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c

>>> index 1d04e401..347ae7d 100644

>>> --- a/drivers/bus/mhi/core/pm.c

>>> +++ b/drivers/bus/mhi/core/pm.c

>>> @@ -444,7 +444,7 @@ static int mhi_pm_mission_mode_transition(struct

>

>>> mhi_controller *mhi_cntrl)

>>>  	return ret;

>>>  }

>>> 

>>> -/* Handle SYS_ERR and Shutdown transitions */

>>> +/* Handle shutdown transitions */

>>>  static void mhi_pm_disable_transition(struct mhi_controller 

>>> *mhi_cntrl,

>>>  				      enum mhi_pm_state transition_state)

>>>  {

>>> @@ -460,10 +460,6 @@ static void mhi_pm_disable_transition(struct 

>>> mhi_controller *mhi_cntrl,

>>>  		to_mhi_pm_state_str(mhi_cntrl->pm_state),

>>>  		to_mhi_pm_state_str(transition_state));

>>> 

>>> -	/* We must notify MHI control driver so it can clean up first */

>>> -	if (transition_state == MHI_PM_SYS_ERR_PROCESS)

>>> -		mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_SYS_ERROR);

>>> -

>>>  	mutex_lock(&mhi_cntrl->pm_mutex);

>>>  	write_lock_irq(&mhi_cntrl->pm_lock);

>>>  	prev_state = mhi_cntrl->pm_state;

>>> @@ -502,11 +498,8 @@ static void mhi_pm_disable_transition(struct 

>>> mhi_controller *mhi_cntrl,

>>>  							    MHICTRL_RESET_SHIFT,

>>>  							    &in_reset) ||

>>>  					!in_reset, timeout);

>>> -		if ((!ret || in_reset) && cur_state == MHI_PM_SYS_ERR_PROCESS) {

>>> +		if (!ret || in_reset)

>>>  			dev_err(dev, "Device failed to exit MHI Reset state\n");

>>> -			mutex_unlock(&mhi_cntrl->pm_mutex);

>>> -			return;

>>> -		}

>>> 

>>>  		/*

>>>  		 * Device will clear BHI_INTVEC as a part of RESET processing,

>>> @@ -566,19 +559,142 @@ static void mhi_pm_disable_transition(struct 

>>> mhi_controller *mhi_cntrl,

>>>  		er_ctxt->wp = er_ctxt->rbase;

>>>  	}

>>> 

>>> -	if (cur_state == MHI_PM_SYS_ERR_PROCESS) {

>>> -		mhi_ready_state_transition(mhi_cntrl);

>>> -	} else {

>>> -		/* Move to disable state */

>>> -		write_lock_irq(&mhi_cntrl->pm_lock);

>>> -		cur_state = mhi_tryset_pm_state(mhi_cntrl, MHI_PM_DISABLE);

>>> -		write_unlock_irq(&mhi_cntrl->pm_lock);

>>> -		if (unlikely(cur_state != MHI_PM_DISABLE))

>>> -			dev_err(dev, "Error moving from PM state: %s to: %s\n",

>>> -				to_mhi_pm_state_str(cur_state),

>>> -				to_mhi_pm_state_str(MHI_PM_DISABLE));

>>> +	/* Move to disable state */

>>> +	write_lock_irq(&mhi_cntrl->pm_lock);

>>> +	cur_state = mhi_tryset_pm_state(mhi_cntrl, MHI_PM_DISABLE);

>>> +	write_unlock_irq(&mhi_cntrl->pm_lock);

>>> +	if (unlikely(cur_state != MHI_PM_DISABLE))

>>> +		dev_err(dev, "Error moving from PM state: %s to: %s\n",

>>> +			to_mhi_pm_state_str(cur_state),

>>> +			to_mhi_pm_state_str(MHI_PM_DISABLE));

>>> +

>>> +	dev_dbg(dev, "Exiting with PM state: %s, MHI state: %s\n",

>>> +		to_mhi_pm_state_str(mhi_cntrl->pm_state),

>>> +		TO_MHI_STATE_STR(mhi_cntrl->dev_state));

>>> +

>>> +	mutex_unlock(&mhi_cntrl->pm_mutex);

>>> +}

>>> +

>>> +/* Handle system error transitions */

>>> +static void mhi_pm_sys_error_transition(struct mhi_controller 

>>> *mhi_cntrl)

>>> +{

>>> +	enum mhi_pm_state cur_state, prev_state;

>>> +	struct mhi_event *mhi_event;

>>> +	struct mhi_cmd_ctxt *cmd_ctxt;

>>> +	struct mhi_cmd *mhi_cmd;

>>> +	struct mhi_event_ctxt *er_ctxt;

>>> +	struct device *dev = &mhi_cntrl->mhi_dev->dev;

>>> +	int ret, i;

>>> +

>>> +	dev_dbg(dev, "Transitioning from PM state: %s to: %s\n",

>>> +		to_mhi_pm_state_str(mhi_cntrl->pm_state),

>>> +		to_mhi_pm_state_str(MHI_PM_SYS_ERR_PROCESS));

>>> +

>>> +	/* We must notify MHI control driver so it can clean up first */

>>> +	mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_SYS_ERROR);

>>> +

>>> +	mutex_lock(&mhi_cntrl->pm_mutex);

>>> +	write_lock_irq(&mhi_cntrl->pm_lock);

>>> +	prev_state = mhi_cntrl->pm_state;

>>> +	cur_state = mhi_tryset_pm_state(mhi_cntrl,

>MHI_PM_SYS_ERR_PROCESS);

>>> +	write_unlock_irq(&mhi_cntrl->pm_lock);

>>> +

>>> +	if (cur_state != MHI_PM_SYS_ERR_PROCESS) {

>>> +		dev_err(dev, "Failed to transition from PM state: %s to: %s\n",

>>> +			to_mhi_pm_state_str(cur_state),

>>> +			to_mhi_pm_state_str(MHI_PM_SYS_ERR_PROCESS));

>>> +		goto exit_sys_error_transition;

>>> +	}

>>> +

>>> +	mhi_cntrl->ee = MHI_EE_DISABLE_TRANSITION;

>>> +	mhi_cntrl->dev_state = MHI_STATE_RESET;

>>> +

>>> +	/* Wake up threads waiting for state transition */

>>> +	wake_up_all(&mhi_cntrl->state_event);

>>> +

>>> +	/* Trigger MHI RESET so that the device will not access host

>memory 

>>> */

>>> +	if (MHI_REG_ACCESS_VALID(prev_state)) {

>>> +		u32 in_reset = -1;

>>> +		unsigned long timeout = msecs_to_jiffies(mhi_cntrl->timeout_ms);

>>> +

>>> +		dev_dbg(dev, "Triggering MHI Reset in device\n");

>>> +		mhi_set_mhi_state(mhi_cntrl, MHI_STATE_RESET);

>>> +

>>> +		/* Wait for the reset bit to be cleared by the device */

>>> +		ret = wait_event_timeout(mhi_cntrl->state_event,

>>> +					 mhi_read_reg_field(mhi_cntrl,

>>> +							    mhi_cntrl->regs,

>>> +							    MHICTRL,

>>> +							    MHICTRL_RESET_MASK,

>>> +							    MHICTRL_RESET_SHIFT,

>>> +							    &in_reset) ||

>>> +					!in_reset, timeout);

>>> +		if (!ret || in_reset) {

>>> +			dev_err(dev, "Device failed to exit MHI Reset state\n");

>>> +			goto exit_sys_error_transition;

>>> +		}

>>> +

>>> +		/*

>>> +		 * Device will clear BHI_INTVEC as a part of RESET processing,

>>> +		 * hence re-program it

>>> +		 */

>>> +		mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0);

>>> +	}

>>> +

>>> +	dev_dbg(dev,

>>> +		"Waiting for all pending event ring processing to complete\n");

>>> +	mhi_event = mhi_cntrl->mhi_event;

>>> +	for (i = 0; i < mhi_cntrl->total_ev_rings; i++, mhi_event++) {

>>> +		if (mhi_event->offload_ev)

>>> +			continue;

>>> +		tasklet_kill(&mhi_event->task);

>>>  	}

>>> 

>>> +	/* Release lock and wait for all pending threads to complete */

>>> +	mutex_unlock(&mhi_cntrl->pm_mutex);

>>> +	dev_dbg(dev, "Waiting for all pending threads to complete\n");

>>> +	wake_up_all(&mhi_cntrl->state_event);

>>> +

>>> +	dev_dbg(dev, "Reset all active channels and remove MHI

>devices\n");

>>> +	device_for_each_child(mhi_cntrl->cntrl_dev, NULL, 

>>> mhi_destroy_device);

>>> +

>>> +	mutex_lock(&mhi_cntrl->pm_mutex);

>>> +

>>> +	WARN_ON(atomic_read(&mhi_cntrl->dev_wake));

>>> +	WARN_ON(atomic_read(&mhi_cntrl->pending_pkts));

>>> +

>>> +	/* Reset the ev rings and cmd rings */

>>> +	dev_dbg(dev, "Resetting EV CTXT and CMD CTXT\n");

>>> +	mhi_cmd = mhi_cntrl->mhi_cmd;

>>> +	cmd_ctxt = mhi_cntrl->mhi_ctxt->cmd_ctxt;

>>> +	for (i = 0; i < NR_OF_CMD_RINGS; i++, mhi_cmd++, cmd_ctxt++) {

>>> +		struct mhi_ring *ring = &mhi_cmd->ring;

>>> +

>>> +		ring->rp = ring->base;

>>> +		ring->wp = ring->base;

>>> +		cmd_ctxt->rp = cmd_ctxt->rbase;

>>> +		cmd_ctxt->wp = cmd_ctxt->rbase;

>>> +	}

>>> +

>>> +	mhi_event = mhi_cntrl->mhi_event;

>>> +	er_ctxt = mhi_cntrl->mhi_ctxt->er_ctxt;

>>> +	for (i = 0; i < mhi_cntrl->total_ev_rings; i++, er_ctxt++,

>>> +	     mhi_event++) {

>>> +		struct mhi_ring *ring = &mhi_event->ring;

>>> +

>>> +		/* Skip offload events */

>>> +		if (mhi_event->offload_ev)

>>> +			continue;

>>> +

>>> +		ring->rp = ring->base;

>>> +		ring->wp = ring->base;

>>> +		er_ctxt->rp = er_ctxt->rbase;

>>> +		er_ctxt->wp = er_ctxt->rbase;

>>> +	}

>>> +

>>> +	mhi_ready_state_transition(mhi_cntrl);

>>> +

>>> +exit_sys_error_transition:

>>>  	dev_dbg(dev, "Exiting with PM state: %s, MHI state: %s\n",

>>>  		to_mhi_pm_state_str(mhi_cntrl->pm_state),

>>>  		TO_MHI_STATE_STR(mhi_cntrl->dev_state));

>>> @@ -666,8 +782,7 @@ void mhi_pm_st_worker(struct work_struct *work)

>>>  			mhi_ready_state_transition(mhi_cntrl);

>>>  			break;

>>>  		case DEV_ST_TRANSITION_SYS_ERR:

>>> -			mhi_pm_disable_transition

>>> -				(mhi_cntrl, MHI_PM_SYS_ERR_PROCESS);

>>> +			mhi_pm_sys_error_transition(mhi_cntrl);

>>>  			break;

>>>  		case DEV_ST_TRANSITION_DISABLE:

>>>  			mhi_pm_disable_transition

>>> --

>>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 

>>> Forum,

>>> a Linux Foundation Collaborative Project

>>> 

>

>Thanks,

>Bhaumik


-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Bhaumik Bhatt Nov. 2, 2020, 4:52 p.m. UTC | #10
Hi Mani,

On 2020-10-30 23:54, Manivannan Sadhasivam wrote:
> Hi Bhaumik,

> 

> On 31 October 2020 1:04:07 AM IST, Bhaumik Bhatt 

> <bbhatt@codeaurora.org> wrote:

>> Hi Mani,

>> 

>> On 2020-10-30 07:06, Manivannan Sadhasivam wrote:

>>> On Thu, Oct 29, 2020 at 09:10:55PM -0700, Bhaumik Bhatt wrote:

>>>> Currently, there exist a set of if...else statements in the

>>>> mhi_pm_disable_transition() function which make handling system

>>>> error and disable transitions differently complex. To make that

>>>> cleaner and facilitate differences in behavior, separate these

>>>> two transitions for MHI host.

>>>> 

>>> 

>>> And this results in a lot of duplicated code :/

>>> 

>>> Thanks,

>>> Mani

>>> 

>> 

>> I knew this was coming. Mainly, we can avoid adding confusing 

>> if...else

>> statements that plague the current mhi_pm_disable_transition() 

>> function

>> 

>> and in

>> return for some duplicate code, we can make handling separate use 

>> cases

>> 

>> easier

>> as they could pop-up anytime in the future.

>> 

> 

> If that happens then do it but now, please no.

> 

> Thanks,

> Mani

> 

It had to be done for 11/12 and 12/12 (last two) patches of the series. 
It's a
much cleaner approach and allows us to handle states better. We should 
be moving
the dev_state and EE to "Reset" and "Disable" states soon enough when a 
shutdown
is initiated and we can resolve bugs such as freeing the IRQs for a 
shutdown
sooner as well. Since these differences in shutdown versus SYS_ERROR 
processing
are already increasingly apparent and more could come, this patch had to 
step
in.

bus: mhi: core: Mark and maintain device states early on after power 
down
bus: mhi: core: Remove MHI event ring IRQ handlers when powering down
>>>> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>

>>>> ---

>>>>  drivers/bus/mhi/core/pm.c | 159

>>>> +++++++++++++++++++++++++++++++++++++++-------

>>>>  1 file changed, 137 insertions(+), 22 deletions(-)

>>>> 

>>>> diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c

>>>> index 1d04e401..347ae7d 100644

>>>> --- a/drivers/bus/mhi/core/pm.c

>>>> +++ b/drivers/bus/mhi/core/pm.c

>>>> @@ -444,7 +444,7 @@ static int mhi_pm_mission_mode_transition(struct

>> 

>>>> mhi_controller *mhi_cntrl)

>>>>  	return ret;

>>>>  }

>>>> 

>>>> -/* Handle SYS_ERR and Shutdown transitions */

>>>> +/* Handle shutdown transitions */

>>>>  static void mhi_pm_disable_transition(struct mhi_controller

>>>> *mhi_cntrl,

>>>>  				      enum mhi_pm_state transition_state)

>>>>  {

>>>> @@ -460,10 +460,6 @@ static void mhi_pm_disable_transition(struct

>>>> mhi_controller *mhi_cntrl,

>>>>  		to_mhi_pm_state_str(mhi_cntrl->pm_state),

>>>>  		to_mhi_pm_state_str(transition_state));

>>>> 

>>>> -	/* We must notify MHI control driver so it can clean up first */

>>>> -	if (transition_state == MHI_PM_SYS_ERR_PROCESS)

>>>> -		mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_SYS_ERROR);

>>>> -

>>>>  	mutex_lock(&mhi_cntrl->pm_mutex);

>>>>  	write_lock_irq(&mhi_cntrl->pm_lock);

>>>>  	prev_state = mhi_cntrl->pm_state;

>>>> @@ -502,11 +498,8 @@ static void mhi_pm_disable_transition(struct

>>>> mhi_controller *mhi_cntrl,

>>>>  							    MHICTRL_RESET_SHIFT,

>>>>  							    &in_reset) ||

>>>>  					!in_reset, timeout);

>>>> -		if ((!ret || in_reset) && cur_state == MHI_PM_SYS_ERR_PROCESS) {

>>>> +		if (!ret || in_reset)

>>>>  			dev_err(dev, "Device failed to exit MHI Reset state\n");

>>>> -			mutex_unlock(&mhi_cntrl->pm_mutex);

>>>> -			return;

>>>> -		}

>>>> 

>>>>  		/*

>>>>  		 * Device will clear BHI_INTVEC as a part of RESET processing,

>>>> @@ -566,19 +559,142 @@ static void mhi_pm_disable_transition(struct

>>>> mhi_controller *mhi_cntrl,

>>>>  		er_ctxt->wp = er_ctxt->rbase;

>>>>  	}

>>>> 

>>>> -	if (cur_state == MHI_PM_SYS_ERR_PROCESS) {

>>>> -		mhi_ready_state_transition(mhi_cntrl);

>>>> -	} else {

>>>> -		/* Move to disable state */

>>>> -		write_lock_irq(&mhi_cntrl->pm_lock);

>>>> -		cur_state = mhi_tryset_pm_state(mhi_cntrl, MHI_PM_DISABLE);

>>>> -		write_unlock_irq(&mhi_cntrl->pm_lock);

>>>> -		if (unlikely(cur_state != MHI_PM_DISABLE))

>>>> -			dev_err(dev, "Error moving from PM state: %s to: %s\n",

>>>> -				to_mhi_pm_state_str(cur_state),

>>>> -				to_mhi_pm_state_str(MHI_PM_DISABLE));

>>>> +	/* Move to disable state */

>>>> +	write_lock_irq(&mhi_cntrl->pm_lock);

>>>> +	cur_state = mhi_tryset_pm_state(mhi_cntrl, MHI_PM_DISABLE);

>>>> +	write_unlock_irq(&mhi_cntrl->pm_lock);

>>>> +	if (unlikely(cur_state != MHI_PM_DISABLE))

>>>> +		dev_err(dev, "Error moving from PM state: %s to: %s\n",

>>>> +			to_mhi_pm_state_str(cur_state),

>>>> +			to_mhi_pm_state_str(MHI_PM_DISABLE));

>>>> +

>>>> +	dev_dbg(dev, "Exiting with PM state: %s, MHI state: %s\n",

>>>> +		to_mhi_pm_state_str(mhi_cntrl->pm_state),

>>>> +		TO_MHI_STATE_STR(mhi_cntrl->dev_state));

>>>> +

>>>> +	mutex_unlock(&mhi_cntrl->pm_mutex);

>>>> +}

>>>> +

>>>> +/* Handle system error transitions */

>>>> +static void mhi_pm_sys_error_transition(struct mhi_controller

>>>> *mhi_cntrl)

>>>> +{

>>>> +	enum mhi_pm_state cur_state, prev_state;

>>>> +	struct mhi_event *mhi_event;

>>>> +	struct mhi_cmd_ctxt *cmd_ctxt;

>>>> +	struct mhi_cmd *mhi_cmd;

>>>> +	struct mhi_event_ctxt *er_ctxt;

>>>> +	struct device *dev = &mhi_cntrl->mhi_dev->dev;

>>>> +	int ret, i;

>>>> +

>>>> +	dev_dbg(dev, "Transitioning from PM state: %s to: %s\n",

>>>> +		to_mhi_pm_state_str(mhi_cntrl->pm_state),

>>>> +		to_mhi_pm_state_str(MHI_PM_SYS_ERR_PROCESS));

>>>> +

>>>> +	/* We must notify MHI control driver so it can clean up first */

>>>> +	mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_SYS_ERROR);

>>>> +

>>>> +	mutex_lock(&mhi_cntrl->pm_mutex);

>>>> +	write_lock_irq(&mhi_cntrl->pm_lock);

>>>> +	prev_state = mhi_cntrl->pm_state;

>>>> +	cur_state = mhi_tryset_pm_state(mhi_cntrl,

>> MHI_PM_SYS_ERR_PROCESS);

>>>> +	write_unlock_irq(&mhi_cntrl->pm_lock);

>>>> +

>>>> +	if (cur_state != MHI_PM_SYS_ERR_PROCESS) {

>>>> +		dev_err(dev, "Failed to transition from PM state: %s to: %s\n",

>>>> +			to_mhi_pm_state_str(cur_state),

>>>> +			to_mhi_pm_state_str(MHI_PM_SYS_ERR_PROCESS));

>>>> +		goto exit_sys_error_transition;

>>>> +	}

>>>> +

>>>> +	mhi_cntrl->ee = MHI_EE_DISABLE_TRANSITION;

>>>> +	mhi_cntrl->dev_state = MHI_STATE_RESET;

>>>> +

>>>> +	/* Wake up threads waiting for state transition */

>>>> +	wake_up_all(&mhi_cntrl->state_event);

>>>> +

>>>> +	/* Trigger MHI RESET so that the device will not access host

>> memory

>>>> */

>>>> +	if (MHI_REG_ACCESS_VALID(prev_state)) {

>>>> +		u32 in_reset = -1;

>>>> +		unsigned long timeout = msecs_to_jiffies(mhi_cntrl->timeout_ms);

>>>> +

>>>> +		dev_dbg(dev, "Triggering MHI Reset in device\n");

>>>> +		mhi_set_mhi_state(mhi_cntrl, MHI_STATE_RESET);

>>>> +

>>>> +		/* Wait for the reset bit to be cleared by the device */

>>>> +		ret = wait_event_timeout(mhi_cntrl->state_event,

>>>> +					 mhi_read_reg_field(mhi_cntrl,

>>>> +							    mhi_cntrl->regs,

>>>> +							    MHICTRL,

>>>> +							    MHICTRL_RESET_MASK,

>>>> +							    MHICTRL_RESET_SHIFT,

>>>> +							    &in_reset) ||

>>>> +					!in_reset, timeout);

>>>> +		if (!ret || in_reset) {

>>>> +			dev_err(dev, "Device failed to exit MHI Reset state\n");

>>>> +			goto exit_sys_error_transition;

>>>> +		}

>>>> +

>>>> +		/*

>>>> +		 * Device will clear BHI_INTVEC as a part of RESET processing,

>>>> +		 * hence re-program it

>>>> +		 */

>>>> +		mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0);

>>>> +	}

>>>> +

>>>> +	dev_dbg(dev,

>>>> +		"Waiting for all pending event ring processing to complete\n");

>>>> +	mhi_event = mhi_cntrl->mhi_event;

>>>> +	for (i = 0; i < mhi_cntrl->total_ev_rings; i++, mhi_event++) {

>>>> +		if (mhi_event->offload_ev)

>>>> +			continue;

>>>> +		tasklet_kill(&mhi_event->task);

>>>>  	}

>>>> 

>>>> +	/* Release lock and wait for all pending threads to complete */

>>>> +	mutex_unlock(&mhi_cntrl->pm_mutex);

>>>> +	dev_dbg(dev, "Waiting for all pending threads to complete\n");

>>>> +	wake_up_all(&mhi_cntrl->state_event);

>>>> +

>>>> +	dev_dbg(dev, "Reset all active channels and remove MHI

>> devices\n");

>>>> +	device_for_each_child(mhi_cntrl->cntrl_dev, NULL,

>>>> mhi_destroy_device);

>>>> +

>>>> +	mutex_lock(&mhi_cntrl->pm_mutex);

>>>> +

>>>> +	WARN_ON(atomic_read(&mhi_cntrl->dev_wake));

>>>> +	WARN_ON(atomic_read(&mhi_cntrl->pending_pkts));

>>>> +

>>>> +	/* Reset the ev rings and cmd rings */

>>>> +	dev_dbg(dev, "Resetting EV CTXT and CMD CTXT\n");

>>>> +	mhi_cmd = mhi_cntrl->mhi_cmd;

>>>> +	cmd_ctxt = mhi_cntrl->mhi_ctxt->cmd_ctxt;

>>>> +	for (i = 0; i < NR_OF_CMD_RINGS; i++, mhi_cmd++, cmd_ctxt++) {

>>>> +		struct mhi_ring *ring = &mhi_cmd->ring;

>>>> +

>>>> +		ring->rp = ring->base;

>>>> +		ring->wp = ring->base;

>>>> +		cmd_ctxt->rp = cmd_ctxt->rbase;

>>>> +		cmd_ctxt->wp = cmd_ctxt->rbase;

>>>> +	}

>>>> +

>>>> +	mhi_event = mhi_cntrl->mhi_event;

>>>> +	er_ctxt = mhi_cntrl->mhi_ctxt->er_ctxt;

>>>> +	for (i = 0; i < mhi_cntrl->total_ev_rings; i++, er_ctxt++,

>>>> +	     mhi_event++) {

>>>> +		struct mhi_ring *ring = &mhi_event->ring;

>>>> +

>>>> +		/* Skip offload events */

>>>> +		if (mhi_event->offload_ev)

>>>> +			continue;

>>>> +

>>>> +		ring->rp = ring->base;

>>>> +		ring->wp = ring->base;

>>>> +		er_ctxt->rp = er_ctxt->rbase;

>>>> +		er_ctxt->wp = er_ctxt->rbase;

>>>> +	}

>>>> +

>>>> +	mhi_ready_state_transition(mhi_cntrl);

>>>> +

>>>> +exit_sys_error_transition:

>>>>  	dev_dbg(dev, "Exiting with PM state: %s, MHI state: %s\n",

>>>>  		to_mhi_pm_state_str(mhi_cntrl->pm_state),

>>>>  		TO_MHI_STATE_STR(mhi_cntrl->dev_state));

>>>> @@ -666,8 +782,7 @@ void mhi_pm_st_worker(struct work_struct *work)

>>>>  			mhi_ready_state_transition(mhi_cntrl);

>>>>  			break;

>>>>  		case DEV_ST_TRANSITION_SYS_ERR:

>>>> -			mhi_pm_disable_transition

>>>> -				(mhi_cntrl, MHI_PM_SYS_ERR_PROCESS);

>>>> +			mhi_pm_sys_error_transition(mhi_cntrl);

>>>>  			break;

>>>>  		case DEV_ST_TRANSITION_DISABLE:

>>>>  			mhi_pm_disable_transition

>>>> --

>>>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora

>>>> Forum,

>>>> a Linux Foundation Collaborative Project

>>>> 

>> 

>> Thanks,

>> Bhaumik


Thanks,
Bhaumik
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
a Linux Foundation Collaborative Project