mbox series

[0/8] UFS patches for kernel 6.11

Message ID 20240617210844.337476-1-bvanassche@acm.org
Headers show
Series UFS patches for kernel 6.11 | expand

Message

Bart Van Assche June 17, 2024, 9:07 p.m. UTC
Hi Martin,

Please consider this series of UFS driver patches for the next merge window.

Thanks,

Bart.

Bart Van Assche (8):
  scsi: ufs: Initialize struct uic_command once
  scsi: ufs: Remove two constants
  scsi: ufs: Inline ufshcd_mcq_vops_get_hba_mac()
  scsi: ufs: Make .get_hba_mac() optional
  scsi: ufs: Declare ufshcd_mcq_poll_cqe_lock() once
  scsi: ufs: Make ufshcd_poll() complain about unsupported arguments
  scsi: ufs: Make the polling code report which command has been
    completed
  scsi: ufs: Check for completion from the timeout handler

 drivers/ufs/core/ufs-mcq.c      |  45 +++++++----
 drivers/ufs/core/ufshcd-priv.h  |  14 +---
 drivers/ufs/core/ufshcd.c       | 131 ++++++++++++++++++++------------
 drivers/ufs/host/ufs-mediatek.c |   2 +-
 drivers/ufs/host/ufs-qcom.c     |   2 +-
 include/ufs/ufshcd.h            |  11 ++-
 include/ufs/ufshci.h            |   2 +-
 7 files changed, 126 insertions(+), 81 deletions(-)

Comments

Daejun Park June 18, 2024, 1:28 a.m. UTC | #1
Hi Bart,

> UFSHCI controllers that are compliant with the UFSHCI 4.0 standard report
> the maximum number of supported commands in the controller capabilities
> register. Use that value if .get_hba_mac == NULL.
> 
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> drivers/ufs/core/ufs-mcq.c 12 +++++++-----
> include/ufs/ufshcd.h        4 +++-
> include/ufs/ufshci.h        2 +-
> 3 files changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c
> index 0482c7a1e419..d6f966f4abef 100644
> --- a/drivers/ufs/core/ufs-mcq.c
> +++ b/drivers/ufs/core/ufs-mcq.c
> @@ -138,7 +138,6 @@ EXPORT_SYMBOL_GPL(ufshcd_mcq_queue_cfg_addr);
>   *
>   * MAC - Max. Active Command of the Host Controller (HC)
>   * HC wouldn't send more than this commands to the device.
> - * It is mandatory to implement get_hba_mac() to enable MCQ mode.
>   * Calculates and adjusts the queue depth based on the depth
>   * supported by the HC and ufs device.
>   */
> @@ -146,10 +145,13 @@ int ufshcd_mcq_decide_queue_depth(struct ufs_hba *hba)
> {
>          int mac = -EOPNOTSUPP;
This initialization can be removed.

> 
> -        if (!hba->vops !hba->vops->get_hba_mac)
> -                goto err;
> -
> -        mac = hba->vops->get_hba_mac(hba);
> +        if (!hba->vops !hba->vops->get_hba_mac) {
> +                hba->capabilities =
> +                        ufshcd_readl(hba, REG_CONTROLLER_CAPABILITIES);
> +                mac = (hba->capabilities & MASK_TRANSFER_REQUESTS_SLOTS) + 1;
> +        } else {
> +                mac = hba->vops->get_hba_mac(hba);
> +        }
>          if (mac < 0)
>                  goto err;
> 
> diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
> index d4d63507d090..d32637d267f3 100644
> --- a/include/ufs/ufshcd.h
> +++ b/include/ufs/ufshcd.h
> @@ -325,7 +325,9 @@ struct ufs_pwr_mode_info {
>   * @event_notify: called to notify important events
>   * @reinit_notify: called to notify reinit of UFSHCD during max gear switch
>   * @mcq_config_resource: called to configure MCQ platform resources
> - * @get_hba_mac: called to get vendor specific mac value, mandatory for mcq mode
> + * @get_hba_mac: reports maximum number of outstanding commands supported by
> + *        the controller. Should be implemented for UFSHCI 4.0 or later
> + *        controllers that are not compliant with the UFSHCI 4.0 specification.
>   * @op_runtime_config: called to config Operation and runtime regs Pointers
>   * @get_outstanding_cqs: called to get outstanding completion queues
>   * @config_esi: called to config Event Specific Interrupt
> diff --git a/include/ufs/ufshci.h b/include/ufs/ufshci.h
> index c50f92bf2e1d..899077bba2d2 100644
> --- a/include/ufs/ufshci.h
> +++ b/include/ufs/ufshci.h
> @@ -67,7 +67,7 @@ enum {
> 
> /* Controller capability masks */
> enum {
> -        MASK_TRANSFER_REQUESTS_SLOTS                = 0x0000001F,
> +        MASK_TRANSFER_REQUESTS_SLOTS                = 0x000000FF,
>          MASK_NUMBER_OUTSTANDING_RTT                = 0x0000FF00,
>          MASK_TASK_MANAGEMENT_REQUEST_SLOTS        = 0x00070000,
>          MASK_EHSLUTRD_SUPPORTED                        = 0x00400000,

Reviewed-by: Daejun Park <daejun7.park@samsung.com>

Thanks,
Daejun
Avri Altman June 18, 2024, 6:23 a.m. UTC | #2
> Make ufshcd_mcq_decide_queue_depth() easier to read by inlining
> ufshcd_mcq_vops_get_hba_mac().
This changes its functionality by making it non-mandatory.
Maybe relate to that in the commit log.
Also, it would make sense to squash it to the next patch, so your line of reasoning is clear.

Thanks,
Avri

> 
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/ufs/core/ufs-mcq.c     | 18 +++++++++++-------
>  drivers/ufs/core/ufshcd-priv.h |  8 --------
>  2 files changed, 11 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c index
> 4bcae410c268..0482c7a1e419 100644
> --- a/drivers/ufs/core/ufs-mcq.c
> +++ b/drivers/ufs/core/ufs-mcq.c
> @@ -144,14 +144,14 @@ EXPORT_SYMBOL_GPL(ufshcd_mcq_queue_cfg_addr);
>   */
>  int ufshcd_mcq_decide_queue_depth(struct ufs_hba *hba)  {
> -       int mac;
> +       int mac = -EOPNOTSUPP;
> 
> -       /* Mandatory to implement get_hba_mac() */
> -       mac = ufshcd_mcq_vops_get_hba_mac(hba);
> -       if (mac < 0) {
> -               dev_err(hba->dev, "Failed to get mac, err=%d\n", mac);
> -               return mac;
> -       }
> +       if (!hba->vops || !hba->vops->get_hba_mac)
> +               goto err;
> +
> +       mac = hba->vops->get_hba_mac(hba);
> +       if (mac < 0)
> +               goto err;
> 
>         WARN_ON_ONCE(!hba->dev_info.bqueuedepth);
>         /*
> @@ -160,6 +160,10 @@ int ufshcd_mcq_decide_queue_depth(struct ufs_hba
> *hba)
>          * shared queuing architecture is enabled.
>          */
>         return min_t(int, mac, hba->dev_info.bqueuedepth);
> +
> +err:
> +       dev_err(hba->dev, "Failed to get mac, err=%d\n", mac);
> +       return mac;
>  }
> 
>  static int ufshcd_mcq_config_nr_queues(struct ufs_hba *hba) diff --git
> a/drivers/ufs/core/ufshcd-priv.h b/drivers/ufs/core/ufshcd-priv.h index
> f42d99ce5bf1..a1add22205db 100644
> --- a/drivers/ufs/core/ufshcd-priv.h
> +++ b/drivers/ufs/core/ufshcd-priv.h
> @@ -255,14 +255,6 @@ static inline int
> ufshcd_vops_mcq_config_resource(struct ufs_hba *hba)
>         return -EOPNOTSUPP;
>  }
> 
> -static inline int ufshcd_mcq_vops_get_hba_mac(struct ufs_hba *hba) -{
> -       if (hba->vops && hba->vops->get_hba_mac)
> -               return hba->vops->get_hba_mac(hba);
> -
> -       return -EOPNOTSUPP;
> -}
> -
>  static inline int ufshcd_mcq_vops_op_runtime_config(struct ufs_hba *hba)  {
>         if (hba->vops && hba->vops->op_runtime_config)
Bart Van Assche June 18, 2024, 4:14 p.m. UTC | #3
On 6/17/24 11:23 PM, Avri Altman wrote:
> 
>> Make ufshcd_mcq_decide_queue_depth() easier to read by inlining
>> ufshcd_mcq_vops_get_hba_mac().
 >
> This changes its functionality by making it non-mandatory.
> Maybe relate to that in the commit log.

I do not agree. Even with this patch applied, .get_hba_mac() is still
mandatory.

> Also, it would make sense to squash it to the next patch, so your line of reasoning is clear.

I prefer to keep the inlining (no functional change) separate from the
patch that introduces a behavior change.

Thanks,

Bart.
Bart Van Assche June 18, 2024, 4:15 p.m. UTC | #4
On 6/17/24 6:25 PM, Daejun Park wrote:
>> @@ -4155,7 +4154,11 @@ EXPORT_SYMBOL_GPL(ufshcd_dme_set_attr);
>> int ufshcd_dme_get_attr(struct ufs_hba *hba, u32 attr_sel,
>>                           u32 *mib_val, u8 peer)
>> {
>> -        struct uic_command uic_cmd = {0};
>> +        struct uic_command uic_cmd = {
>> +                .command = peer ? UIC_CMD_DME_PEER_GET : UIC_CMD_DME_GET,
>> +                .argument1 = attr_sel,
>> +
> Empty line.

Thanks for the feedback. I will remove this empty line before I repost this patch
series.

Bart.
Bart Van Assche June 18, 2024, 4:17 p.m. UTC | #5
On 6/17/24 6:28 PM, Daejun Park wrote:
>> @@ -146,10 +145,13 @@ int ufshcd_mcq_decide_queue_depth(struct ufs_hba *hba)
>> {
>>           int mac = -EOPNOTSUPP;
> This initialization can be removed.

I will remove the initialization.

Thanks,

Bart.
Manivannan Sadhasivam June 19, 2024, 6:58 a.m. UTC | #6
On Mon, Jun 17, 2024 at 02:07:41PM -0700, Bart Van Assche wrote:
> The SCSI host template members .cmd_per_lun and .can_queue are copied
> into the SCSI host data structure. Before these are used, these are
> overwritten by ufshcd_init(). Hence, this patch does not change any
> functionality.
> 
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

- Mani

> ---
>  drivers/ufs/core/ufshcd.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 5d784876513e..7761ccca2115 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -164,8 +164,6 @@ EXPORT_SYMBOL_GPL(ufshcd_dump_regs);
>  enum {
>  	UFSHCD_MAX_CHANNEL	= 0,
>  	UFSHCD_MAX_ID		= 1,
> -	UFSHCD_CMD_PER_LUN	= 32 - UFSHCD_NUM_RESERVED,
> -	UFSHCD_CAN_QUEUE	= 32 - UFSHCD_NUM_RESERVED,
>  };
>  
>  static const char *const ufshcd_state_name[] = {
> @@ -8959,8 +8957,6 @@ static const struct scsi_host_template ufshcd_driver_template = {
>  	.eh_timed_out		= ufshcd_eh_timed_out,
>  	.this_id		= -1,
>  	.sg_tablesize		= SG_ALL,
> -	.cmd_per_lun		= UFSHCD_CMD_PER_LUN,
> -	.can_queue		= UFSHCD_CAN_QUEUE,
>  	.max_segment_size	= PRDT_DATA_BYTE_COUNT_MAX,
>  	.max_sectors		= SZ_1M / SECTOR_SIZE,
>  	.max_host_blocked	= 1,
Manivannan Sadhasivam June 19, 2024, 7:13 a.m. UTC | #7
On Mon, Jun 17, 2024 at 02:07:43PM -0700, Bart Van Assche wrote:
> UFSHCI controllers that are compliant with the UFSHCI 4.0 standard report
> the maximum number of supported commands in the controller capabilities
> register. Use that value if .get_hba_mac == NULL.
> 
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/ufs/core/ufs-mcq.c | 12 +++++++-----
>  include/ufs/ufshcd.h       |  4 +++-
>  include/ufs/ufshci.h       |  2 +-
>  3 files changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c
> index 0482c7a1e419..d6f966f4abef 100644
> --- a/drivers/ufs/core/ufs-mcq.c
> +++ b/drivers/ufs/core/ufs-mcq.c
> @@ -138,7 +138,6 @@ EXPORT_SYMBOL_GPL(ufshcd_mcq_queue_cfg_addr);
>   *
>   * MAC - Max. Active Command of the Host Controller (HC)
>   * HC wouldn't send more than this commands to the device.
> - * It is mandatory to implement get_hba_mac() to enable MCQ mode.
>   * Calculates and adjusts the queue depth based on the depth
>   * supported by the HC and ufs device.
>   */
> @@ -146,10 +145,13 @@ int ufshcd_mcq_decide_queue_depth(struct ufs_hba *hba)
>  {
>  	int mac = -EOPNOTSUPP;
>  
> -	if (!hba->vops || !hba->vops->get_hba_mac)
> -		goto err;
> -
> -	mac = hba->vops->get_hba_mac(hba);
> +	if (!hba->vops || !hba->vops->get_hba_mac) {
> +		hba->capabilities =
> +			ufshcd_readl(hba, REG_CONTROLLER_CAPABILITIES);
> +		mac = (hba->capabilities & MASK_TRANSFER_REQUESTS_SLOTS) + 1;
> +	} else {
> +		mac = hba->vops->get_hba_mac(hba);
> +	}
>  	if (mac < 0)
>  		goto err;
>  
> diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
> index d4d63507d090..d32637d267f3 100644
> --- a/include/ufs/ufshcd.h
> +++ b/include/ufs/ufshcd.h
> @@ -325,7 +325,9 @@ struct ufs_pwr_mode_info {
>   * @event_notify: called to notify important events
>   * @reinit_notify: called to notify reinit of UFSHCD during max gear switch
>   * @mcq_config_resource: called to configure MCQ platform resources
> - * @get_hba_mac: called to get vendor specific mac value, mandatory for mcq mode
> + * @get_hba_mac: reports maximum number of outstanding commands supported by
> + *	the controller. Should be implemented for UFSHCI 4.0 or later
> + *	controllers that are not compliant with the UFSHCI 4.0 specification.
>   * @op_runtime_config: called to config Operation and runtime regs Pointers
>   * @get_outstanding_cqs: called to get outstanding completion queues
>   * @config_esi: called to config Event Specific Interrupt
> diff --git a/include/ufs/ufshci.h b/include/ufs/ufshci.h
> index c50f92bf2e1d..899077bba2d2 100644
> --- a/include/ufs/ufshci.h
> +++ b/include/ufs/ufshci.h
> @@ -67,7 +67,7 @@ enum {
>  
>  /* Controller capability masks */
>  enum {
> -	MASK_TRANSFER_REQUESTS_SLOTS		= 0x0000001F,
> +	MASK_TRANSFER_REQUESTS_SLOTS		= 0x000000FF,

This should be a separate fix that comes before this patch. But I believe this
came up during MCQ review as well and I don't remember what was the reply from
Can. 0x1F is the mask for SDB mode and 0xFF is the mask for MCQ mode.

Can, can you comment more?

- Mani
Manivannan Sadhasivam June 19, 2024, 7:57 a.m. UTC | #8
+ Can

On Wed, Jun 19, 2024 at 12:43:29PM +0530, Manivannan Sadhasivam wrote:
> On Mon, Jun 17, 2024 at 02:07:43PM -0700, Bart Van Assche wrote:
> > UFSHCI controllers that are compliant with the UFSHCI 4.0 standard report
> > the maximum number of supported commands in the controller capabilities
> > register. Use that value if .get_hba_mac == NULL.
> > 
> > Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> > ---
> >  drivers/ufs/core/ufs-mcq.c | 12 +++++++-----
> >  include/ufs/ufshcd.h       |  4 +++-
> >  include/ufs/ufshci.h       |  2 +-
> >  3 files changed, 11 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c
> > index 0482c7a1e419..d6f966f4abef 100644
> > --- a/drivers/ufs/core/ufs-mcq.c
> > +++ b/drivers/ufs/core/ufs-mcq.c
> > @@ -138,7 +138,6 @@ EXPORT_SYMBOL_GPL(ufshcd_mcq_queue_cfg_addr);
> >   *
> >   * MAC - Max. Active Command of the Host Controller (HC)
> >   * HC wouldn't send more than this commands to the device.
> > - * It is mandatory to implement get_hba_mac() to enable MCQ mode.
> >   * Calculates and adjusts the queue depth based on the depth
> >   * supported by the HC and ufs device.
> >   */
> > @@ -146,10 +145,13 @@ int ufshcd_mcq_decide_queue_depth(struct ufs_hba *hba)
> >  {
> >  	int mac = -EOPNOTSUPP;
> >  
> > -	if (!hba->vops || !hba->vops->get_hba_mac)
> > -		goto err;
> > -
> > -	mac = hba->vops->get_hba_mac(hba);
> > +	if (!hba->vops || !hba->vops->get_hba_mac) {
> > +		hba->capabilities =
> > +			ufshcd_readl(hba, REG_CONTROLLER_CAPABILITIES);
> > +		mac = (hba->capabilities & MASK_TRANSFER_REQUESTS_SLOTS) + 1;
> > +	} else {
> > +		mac = hba->vops->get_hba_mac(hba);
> > +	}
> >  	if (mac < 0)
> >  		goto err;
> >  
> > diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
> > index d4d63507d090..d32637d267f3 100644
> > --- a/include/ufs/ufshcd.h
> > +++ b/include/ufs/ufshcd.h
> > @@ -325,7 +325,9 @@ struct ufs_pwr_mode_info {
> >   * @event_notify: called to notify important events
> >   * @reinit_notify: called to notify reinit of UFSHCD during max gear switch
> >   * @mcq_config_resource: called to configure MCQ platform resources
> > - * @get_hba_mac: called to get vendor specific mac value, mandatory for mcq mode
> > + * @get_hba_mac: reports maximum number of outstanding commands supported by
> > + *	the controller. Should be implemented for UFSHCI 4.0 or later
> > + *	controllers that are not compliant with the UFSHCI 4.0 specification.
> >   * @op_runtime_config: called to config Operation and runtime regs Pointers
> >   * @get_outstanding_cqs: called to get outstanding completion queues
> >   * @config_esi: called to config Event Specific Interrupt
> > diff --git a/include/ufs/ufshci.h b/include/ufs/ufshci.h
> > index c50f92bf2e1d..899077bba2d2 100644
> > --- a/include/ufs/ufshci.h
> > +++ b/include/ufs/ufshci.h
> > @@ -67,7 +67,7 @@ enum {
> >  
> >  /* Controller capability masks */
> >  enum {
> > -	MASK_TRANSFER_REQUESTS_SLOTS		= 0x0000001F,
> > +	MASK_TRANSFER_REQUESTS_SLOTS		= 0x000000FF,
> 
> This should be a separate fix that comes before this patch. But I believe this
> came up during MCQ review as well and I don't remember what was the reply from
> Can. 0x1F is the mask for SDB mode and 0xFF is the mask for MCQ mode.
> 
> Can, can you comment more?
> 

Oops. Can is not CCed. Added now.

- Mani


> - Mani
> 
> -- 
> மணிவண்ணன் சதாசிவம்
Peter Wang (王信友) June 21, 2024, 3:32 a.m. UTC | #9
On Wed, 2024-06-19 at 13:27 +0530, Manivannan Sadhasivam wrote:
> > > --- a/include/ufs/ufshci.h
> > > +++ b/include/ufs/ufshci.h
> > > @@ -67,7 +67,7 @@ enum {
> > >  
> > >  /* Controller capability masks */
> > >  enum {
> > > -MASK_TRANSFER_REQUESTS_SLOTS= 0x0000001F,
> > > +MASK_TRANSFER_REQUESTS_SLOTS= 0x000000FF,
> > 
> > This should be a separate fix that comes before this patch. But I
> believe this
> > came up during MCQ review as well and I don't remember what was the
> reply from
> > Can. 0x1F is the mask for SDB mode and 0xFF is the mask for MCQ
> mode.
> > 
> > Can, can you comment more?
> > 
> 

Hi Bart and Mani,

It is correct that 0x1F is SDB mode.
ufshcd_mcq_decide_queue_depth is running before mcq enable.
So that value read is still SDB value, not MCQ value.


Thanks.
Peter


> Oops. Can is not CCed. Added now.
> 
> - Mani
> 
> 
> > - Mani
> > 
> > -- 
> > மணிவண்ணன் சதாசிவம்
> 
> -- 
> மணிவண்ணன் சதாசிவம்
Bart Van Assche June 21, 2024, 5:23 p.m. UTC | #10
On 6/20/24 11:54 PM, Peter Wang (王信友) wrote:
>    Unable to handle kernel NULL pointer dereference at virtual address
> 0000000000000194
>    pc : [0xffffffddd7a79bf8] blk_mq_unique_tag+0x8/0x14
>    lr : [0xffffffddd6155b84] ufshcd_mcq_req_to_hwq+0x1c/0x40
> [ufs_mediatek_mod_ise]
>     do_mem_abort+0x58/0x118
>     el1_abort+0x3c/0x5c
>     el1h_64_sync_handler+0x54/0x90
>     el1h_64_sync+0x68/0x6c
>     blk_mq_unique_tag+0x8/0x14
>     ufshcd_err_handler+0xae4/0xfa8 [ufs_mediatek_mod_ise]
>     process_one_work+0x208/0x4fc
>     worker_thread+0x228/0x438
>     kthread+0x104/0x1d4
>     ret_from_fork+0x10/0x20

Hi Peter,

The above backtrace can only occur with MCQ enabled. The backtrace I
posted was triggered on a system with a UFSHCI 3.0 controller (no MCQ).
So I think that the backtraces have different root causes and hence that
different patches are required to fix both crashes.

Thanks,

Bart.
Peter Wang (王信友) June 24, 2024, 8:39 a.m. UTC | #11
On Sun, 2024-06-23 at 19:03 +0530, manivannan.sadhasivam@linaro.org
wrote:
>  	 
> 
> > 
> > Hi Bart and Mani,
> > 
> > It is correct that 0x1F is SDB mode.
> > ufshcd_mcq_decide_queue_depth is running before mcq enable.
> > So that value read is still SDB value, not MCQ value.
> > 
> 
> I don't quite understand. My concern was that if we change the mask
> for MCQ,
> then existing 'nutrs' value for SDB could be impacted with this
> change.
> 
> Perhaps we should use different masks?
> 
> - Mani
> 

Hi Mani,

Yes, it is better use different mask with different mode.

And we can only use 0xFF mask if 0x300[0] = 1 (MCQ enable)
use 0x1F mask if 0x300[0] = 0 (Single doorbell mode)

Mediatek design in MCQ mode is 64, Single doorbell mode is 32.

Thanks.
Peter
Peter Wang (王信友) June 25, 2024, 10:04 a.m. UTC | #12
On Mon, 2024-06-24 at 11:12 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 6/24/24 1:54 AM, Peter Wang (王信友) wrote:
> > Your backtrace is Single doorbell mode. But I am curious that
> > how could it happen if complete a cmd twice and get null pointer
> > next time queuecommand? could you describe the exactly flow?
> 
> SCSI commands are completed only once. See also the code in the SCSI
> core that sets the SCMD_STATE_COMPLETE bit:
> 
> $ git grep -nH 'test_and_set_bit(SCMD_STATE_COMPLETE'
> drivers/scsi/scsi_error.c:362:if
> (test_and_set_bit(SCMD_STATE_COMPLETE, 
> &scmd->state))
> drivers/scsi/scsi_lib.c:1716:if 
> (unlikely(test_and_set_bit(SCMD_STATE_COMPLETE, &cmd->state)))
> 
> In other words, either the regular completion code is executed by
> scsi_done_internal() or error handling is initiated by
> scsi_timeout().
> Only one of the two happens.
> 
> The only exception is that .eh_timed_out() may be called concurrently
> with the regular completion handler. Hence this patch for
> ufshcd_eh_timed_out().
> 
> Bart.


Hi Bart,

I still confused.
When eh_timed_out() called concurrently with the regular completion
handler.
Both process use test_and_set_bit(SCMD_STATE_COMPLETE, &cmd->state) to
set SCMD_STATE_COMPLETE flag.
test_and_set_bit should be atomice operation? and only one can set this
bit than run forward?

BTW, I still don't understand if both eh_timed_out and regular
completion handler set SCMD_STATE_COMPLETE,
what is the relation between SCMD_STATE_COMPLETE and
ufshcd_queuecommand null pointer?

Thanks.
Peter
Bart Van Assche June 25, 2024, 4:33 p.m. UTC | #13
On 6/25/24 3:04 AM, Peter Wang (王信友) wrote:
> When eh_timed_out() called concurrently with the regular completion
> handler.
> Both process use test_and_set_bit(SCMD_STATE_COMPLETE, &cmd->state) to
> set SCMD_STATE_COMPLETE flag.
> test_and_set_bit should be atomice operation? and only one can set this
> bit than run forward?

Yes, only one of the two test_and_set_bit(SCMD_STATE_COMPLETE,
&cmd->state) calls will set the SCMD_STATE_COMPLETE bit and only
one of these two calls will return the value 'true'.

> BTW, I still don't understand if both eh_timed_out and regular
> completion handler set SCMD_STATE_COMPLETE,
> what is the relation between SCMD_STATE_COMPLETE and
> ufshcd_queuecommand null pointer?

While ufshcd_eh_timed_out() does not set the SCMD_STATE_COMPLETE bit,
its caller may set that bit after ufshcd_eh_timed_out() has returned.
Hence, a command may be completed while ufshcd_eh_timed_out() is in
progress.

Thanks,

Bart.
Peter Wang (王信友) June 26, 2024, 3:54 a.m. UTC | #14
On Tue, 2024-06-25 at 09:33 -0700, Bart Van Assche wrote:
> 
> 
> While ufshcd_eh_timed_out() does not set the SCMD_STATE_COMPLETE bit,
> its caller may set that bit after ufshcd_eh_timed_out() has returned.
> Hence, a command may be completed while ufshcd_eh_timed_out() is in
> progress.
> 
> Thanks,
> 
> Bart.

Hi Bart,

So, when this concurrency issue happen, which one set the
SCMD_STATE_COMPLETE flag?
scsi_timeout or scsi_done_internal?
And why ufshcd_queuecommand got null pointer? which pointer is null?



Thanks.
Peter
Bart Van Assche June 26, 2024, 9:54 p.m. UTC | #15
On 6/25/24 8:54 PM, Peter Wang (王信友) wrote:
> And why ufshcd_queuecommand got null pointer? which pointer is null?

I'm not sure. faddr2line reports that the crash happens in the source
code line with the following assignment: "sg_dma_len(sg) = sg->length".
That seems weird to me. If the sg pointer would be invalid then an
earlier dereference of the 'sg' pointer should already have triggered a
crash. The entire function is as follows:

int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int 
nents,
		enum dma_data_direction dir, unsigned long attrs)
{
	int i;
	struct scatterlist *sg;

	for_each_sg(sgl, sg, nents, i) {
		sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
				sg->offset, sg->length, dir, attrs);
		if (sg->dma_address == DMA_MAPPING_ERROR)
			goto out_unmap;
		sg_dma_len(sg) = sg->length;
	}

	return nents;

out_unmap:
	dma_direct_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
	return -EIO;
}

Bart.
Peter Wang (王信友) June 27, 2024, 10:56 a.m. UTC | #16
On Wed, 2024-06-26 at 14:54 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 6/25/24 8:54 PM, Peter Wang (王信友) wrote:
> > And why ufshcd_queuecommand got null pointer? which pointer is
> null?
> 
> I'm not sure. faddr2line reports that the crash happens in the source
> code line with the following assignment: "sg_dma_len(sg) = sg-
> >length".
> That seems weird to me. If the sg pointer would be invalid then an
> earlier dereference of the 'sg' pointer should already have triggered
> a
> 



Hi Bart,

This is really weird.
Perphas it is dram corrupt issue?
And is unrelated to the ufshcd_abort racing I think.


Thanks.
Peter
Bart Van Assche June 27, 2024, 4:33 p.m. UTC | #17
On 6/27/24 3:56 AM, Peter Wang (王信友) wrote:
> This is really weird.
> Perphas it is dram corrupt issue?
> And is unrelated to the ufshcd_abort racing I think.

We have seen this crash five times with kernel 5.15. I think that
the number of occurrences is too high to be caused by DRAM corruption.
Anyway, I will leave out patches 7/8 and 8/8 from this patch series.
We can revisit these patches if the issue would be observed again with
a later kernel.

Bart.