mbox series

[v5,00/16] Add Multi Circular Queue Support

Message ID cover.1669176158.git.quic_asutoshd@quicinc.com
Headers show
Series Add Multi Circular Queue Support | expand

Message

Asutosh Das Nov. 23, 2022, 4:10 a.m. UTC
UFS Multi-Circular Queue (MCQ) has been added in UFSHCI v4.0 to improve storage performance.
The implementation uses the shared tagging mechanism so that tags are shared
among the hardware queues. The number of hardware queues is configurable.
This series doesn't include the ESI implementation for completion handling.
This implementation has been verified by on a Qualcomm platform.

Please take a look and let us know your thoughts.

v4 -> v5:
- Fixed failure to fallback to SDB during initialization
- Fixed failure when rpm-lvl=5 in the ufshcd_host_reset_and_restore() path
- Improved ufshcd_mcq_config_nr_queues() to handle different configurations
- Addressed Bart's comments
- Verified read/write using FIO, clock gating, runtime-pm[lvl=3, lvl=5]

v3 -> v4:
- Added a kernel module parameter to disable MCQ mode
- Added Bart's reviewed-by tag for some patches
- Addressed Bart's comments

v2 -> v3:
- Split ufshcd_config_mcq() into ufshcd_alloc_mcq() and ufshcd_config_mcq()
- Use devm_kzalloc() in ufshcd_mcq_init()
- Free memory and resource allocation on error paths
- Corrected typos in code comments

v1 -> v2:
- Added a non MCQ related change to use a function to extrace ufs extended
feature
- Addressed Mani's comments
- Addressed Bart's comments

v1:
- Split the changes
- Addressed Bart's comments
- Addressed Bean's comments

* RFC versions:
v2 -> v3:
- Split the changes based on functionality
- Addressed queue configuration issues
- Faster SQE tail pointer increments
- Addressed comments from Bart and Manivannan

v1 -> v2:
- Enabled host_tagset
- Added queue num configuration support
- Added one more vops to allow vendor provide the wanted MAC
- Determine nutrs and can_queue by considering both MAC, bqueuedepth and EXT_IID support
- Postponed MCQ initialization and scsi_add_host() to async probe
- Used (EXT_IID, Task Tag) tuple to support up to 4096 tasks (theoretically)

Asutosh Das (16):
  ufs: core: Optimize duplicate code to read extended feature
  ufs: core: Probe for ext_iid support
  ufs: core: Introduce Multi-circular queue capability
  ufs: core: Defer adding host to scsi if mcq is supported
  ufs: core: mcq: Add Multi Circular Queue support
  ufs: core: mcq: Configure resource regions
  ufs: core: mcq: Calculate queue depth
  ufs: core: mcq: Allocate memory for mcq mode
  ufs: core: mcq: Configure operation and runtime interface
  ufs: core: mcq: Use shared tags for MCQ mode
  ufs: core: Prepare ufshcd_send_command for mcq
  ufs: core: mcq: Find hardware queue to queue request
  ufs: core: Prepare for completion in mcq
  ufs: mcq: Add completion support of a cqe
  ufs: core: mcq: Add completion support in poll
  ufs: core: mcq: Enable Multi Circular Queue

 drivers/ufs/core/Makefile      |   2 +-
 drivers/ufs/core/ufs-mcq.c     | 509 +++++++++++++++++++++++++++++++++++++++++
 drivers/ufs/core/ufshcd-priv.h |  84 ++++++-
 drivers/ufs/core/ufshcd.c      | 390 +++++++++++++++++++++++++------
 drivers/ufs/host/ufs-qcom.c    |  48 ++++
 drivers/ufs/host/ufs-qcom.h    |   4 +
 include/ufs/ufs.h              |   6 +
 include/ufs/ufshcd.h           | 125 ++++++++++
 include/ufs/ufshci.h           |  64 ++++++
 9 files changed, 1165 insertions(+), 67 deletions(-)
 create mode 100644 drivers/ufs/core/ufs-mcq.c

Comments

Bart Van Assche Nov. 26, 2022, 12:20 a.m. UTC | #1
On 11/22/22 20:10, Asutosh Das wrote:
> +module_param_cb(use_mcq_mode, &mcq_mode_ops, &use_mcq_mode, 0644);
> +MODULE_PARM_DESC(mcq_mode, "Control MCQ mode for UFSHCI 4.0 controllers");

Please make this description more detailed. The following information 
should be added:
* 0 disables MCQ.
* 1 enables MCQ.
* MCQ is enabled by default.

Once that information has been added, feel free to add:

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Manivannan Sadhasivam Nov. 28, 2022, 12:20 p.m. UTC | #2
On Tue, Nov 22, 2022 at 08:10:14PM -0800, Asutosh Das wrote:
> The code to parse the extended feature is duplicated twice
> in the ufs core. Replace the duplicated code with a
> function.
> 
> Signed-off-by: Asutosh Das <quic_asutoshd@quicinc.com>

Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>

Thanks,
Mani

> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/ufs/core/ufshcd.c | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 768cb49..c9d7b78 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -215,6 +215,17 @@ ufs_get_desired_pm_lvl_for_dev_link_state(enum ufs_dev_pwr_mode dev_state,
>  	return UFS_PM_LVL_0;
>  }
>  
> +static unsigned int ufs_get_ext_ufs_feature(struct ufs_hba *hba,
> +					    const u8 *desc_buf)
> +{
> +	if (hba->desc_size[QUERY_DESC_IDN_DEVICE] <
> +	    DEVICE_DESC_PARAM_EXT_UFS_FEATURE_SUP + 4)
> +		return 0;
> +
> +	return get_unaligned_be32(desc_buf +
> +				  DEVICE_DESC_PARAM_EXT_UFS_FEATURE_SUP);
> +}
> +
>  static const struct ufs_dev_quirk ufs_fixups[] = {
>  	/* UFS cards deviations table */
>  	{ .wmanufacturerid = UFS_VENDOR_MICRON,
> @@ -7584,13 +7595,7 @@ static void ufshcd_wb_probe(struct ufs_hba *hba, const u8 *desc_buf)
>  	     (hba->dev_quirks & UFS_DEVICE_QUIRK_SUPPORT_EXTENDED_FEATURES)))
>  		goto wb_disabled;
>  
> -	if (hba->desc_size[QUERY_DESC_IDN_DEVICE] <
> -	    DEVICE_DESC_PARAM_EXT_UFS_FEATURE_SUP + 4)
> -		goto wb_disabled;
> -
> -	ext_ufs_feature = get_unaligned_be32(desc_buf +
> -					DEVICE_DESC_PARAM_EXT_UFS_FEATURE_SUP);
> -
> +	ext_ufs_feature = ufs_get_ext_ufs_feature(hba, desc_buf);
>  	if (!(ext_ufs_feature & UFS_DEV_WRITE_BOOSTER_SUP))
>  		goto wb_disabled;
>  
> @@ -7644,7 +7649,7 @@ static void ufshcd_temp_notif_probe(struct ufs_hba *hba, const u8 *desc_buf)
>  	if (!(hba->caps & UFSHCD_CAP_TEMP_NOTIF) || dev_info->wspecversion < 0x300)
>  		return;
>  
> -	ext_ufs_feature = get_unaligned_be32(desc_buf + DEVICE_DESC_PARAM_EXT_UFS_FEATURE_SUP);
> +	ext_ufs_feature = ufs_get_ext_ufs_feature(hba, desc_buf);
>  
>  	if (ext_ufs_feature & UFS_DEV_LOW_TEMP_NOTIF)
>  		mask |= MASK_EE_TOO_LOW_TEMP;
> -- 
> 2.7.4
>
Manivannan Sadhasivam Nov. 28, 2022, 2:29 p.m. UTC | #3
On Tue, Nov 22, 2022 at 08:10:16PM -0800, Asutosh Das wrote:
> Add support to check for MCQ capability in the UFSHC.
> Add a module parameter to disable MCQ if needed.
> 
> Co-developed-by: Can Guo <quic_cang@quicinc.com>
> Signed-off-by: Can Guo <quic_cang@quicinc.com>
> Signed-off-by: Asutosh Das <quic_asutoshd@quicinc.com>

Couple of nitpicks below, with those addressed:

Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>

> ---
>  drivers/ufs/core/ufshcd.c | 31 +++++++++++++++++++++++++++++++
>  include/ufs/ufshcd.h      |  2 ++
>  2 files changed, 33 insertions(+)
> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 66b797f..08be8ad 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -89,6 +89,33 @@
>  /* Polling time to wait for fDeviceInit */
>  #define FDEVICEINIT_COMPL_TIMEOUT 1500 /* millisecs */
>  
> +/* UFSHC 4.0 compliant HC support this mode, refer param_set_mcq_mode() */
> +static bool use_mcq_mode = true;
> +
> +static inline bool is_mcq_supported(struct ufs_hba *hba)

Please get rid of inline keyword and let the compiler handle it.

> +{
> +	return hba->mcq_sup && use_mcq_mode;
> +}
> +
> +static int param_set_mcq_mode(const char *val, const struct kernel_param *kp)
> +{
> +	int ret;
> +
> +	ret = param_set_bool(val, kp);
> +	if (ret)
> +		return ret;
> +
> +	return 0;
> +}
> +
> +static const struct kernel_param_ops mcq_mode_ops = {
> +	.set = param_set_mcq_mode,
> +	.get = param_get_bool,
> +};
> +
> +module_param_cb(use_mcq_mode, &mcq_mode_ops, &use_mcq_mode, 0644);
> +MODULE_PARM_DESC(mcq_mode, "Control MCQ mode for UFSHCI 4.0 controllers");

Is it ok to mention only 4.0? What about future revisions?

Thanks,
Mani
> +
>  #define ufshcd_toggle_vreg(_dev, _vreg, _on)				\
>  	({                                                              \
>  		int _ret;                                               \
> @@ -2258,6 +2285,10 @@ static inline int ufshcd_hba_capabilities(struct ufs_hba *hba)
>  	if (err)
>  		dev_err(hba->dev, "crypto setup failed\n");
>  
> +	hba->mcq_sup = FIELD_GET(MASK_MCQ_SUPPORT, hba->capabilities);
> +	if (!hba->mcq_sup)
> +		return err;
> +
>  	hba->mcq_capabilities = ufshcd_readl(hba, REG_MCQCAP);
>  	hba->ext_iid_sup = FIELD_GET(MASK_EXT_IID_SUPPORT,
>  				     hba->mcq_capabilities);
> diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
> index aec37cb9..70c0f9f 100644
> --- a/include/ufs/ufshcd.h
> +++ b/include/ufs/ufshcd.h
> @@ -832,6 +832,7 @@ struct ufs_hba_monitor {
>   * @complete_put: whether or not to call ufshcd_rpm_put() from inside
>   *	ufshcd_resume_complete()
>   * @ext_iid_sup: is EXT_IID is supported by UFSHC
> + * @mcq_sup: is mcq supported by UFSHC
>   */
>  struct ufs_hba {
>  	void __iomem *mmio_base;
> @@ -982,6 +983,7 @@ struct ufs_hba {
>  	u32 luns_avail;
>  	bool complete_put;
>  	bool ext_iid_sup;
> +	bool mcq_sup;
>  };
>  
>  /* Returns true if clocks can be gated. Otherwise false */
> -- 
> 2.7.4
>
Manivannan Sadhasivam Nov. 28, 2022, 2:53 p.m. UTC | #4
On Tue, Nov 22, 2022 at 08:10:18PM -0800, Asutosh Das wrote:
> Add support for multi-circular queue (MCQ) which has been added
> in UFSHC v4.0 standard in addition to the Single Doorbell mode.
> The MCQ mode supports multiple submission and completion queues.
> Add support to configure the number of queues.
> 

The patch subject is pretty opaque. Please use something like "Add initial
Multi Circular Queue support" or something similar to specify that this patch
only adds support for configuring the queues and not the full MCQ support.

Also, this patch adds the module params for queues, so that should be mentioned
in the description.

> Co-developed-by: Can Guo <quic_cang@quicinc.com>
> Signed-off-by: Can Guo <quic_cang@quicinc.com>
> Signed-off-by: Asutosh Das <quic_asutoshd@quicinc.com>
> ---
>  drivers/ufs/core/Makefile      |   2 +-
>  drivers/ufs/core/ufs-mcq.c     | 125 +++++++++++++++++++++++++++++++++++++++++
>  drivers/ufs/core/ufshcd-priv.h |   1 +
>  drivers/ufs/core/ufshcd.c      |  12 ++++
>  include/ufs/ufshcd.h           |   4 ++
>  5 files changed, 143 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/ufs/core/ufs-mcq.c
> 
> diff --git a/drivers/ufs/core/Makefile b/drivers/ufs/core/Makefile
> index 62f38c5..4d02e0f 100644
> --- a/drivers/ufs/core/Makefile
> +++ b/drivers/ufs/core/Makefile
> @@ -1,7 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0
>  
>  obj-$(CONFIG_SCSI_UFSHCD)		+= ufshcd-core.o
> -ufshcd-core-y				+= ufshcd.o ufs-sysfs.o
> +ufshcd-core-y				+= ufshcd.o ufs-sysfs.o ufs-mcq.o
>  ufshcd-core-$(CONFIG_DEBUG_FS)		+= ufs-debugfs.o
>  ufshcd-core-$(CONFIG_SCSI_UFS_BSG)	+= ufs_bsg.o
>  ufshcd-core-$(CONFIG_SCSI_UFS_CRYPTO)	+= ufshcd-crypto.o
> diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c
> new file mode 100644
> index 0000000..3818f45
> --- /dev/null
> +++ b/drivers/ufs/core/ufs-mcq.c
> @@ -0,0 +1,125 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2022 Qualcomm Innovation Center. All rights reserved.
> + *
> + * Authors:
> + *	Asutosh Das <quic_asutoshd@quicinc.com>
> + *	Can Guo <quic_cang@quicinc.com>
> + */
> +
> +#include <asm/unaligned.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/module.h>
> +#include <linux/platform_device.h>
> +#include "ufshcd-priv.h"
> +
> +#define UFS_MCQ_MIN_RW_QUEUES 2
> +#define UFS_MCQ_MIN_READ_QUEUES 0
> +#define UFS_MCQ_NUM_DEV_CMD_QUEUES 1
> +#define UFS_MCQ_MIN_POLL_QUEUES 0
> +

Remove extra new line

> +
> +static int rw_queue_count_set(const char *val, const struct kernel_param *kp)
> +{
> +	return param_set_uint_minmax(val, kp, UFS_MCQ_MIN_RW_QUEUES,
> +				     num_possible_cpus());
> +}
> +
> +static const struct kernel_param_ops rw_queue_count_ops = {
> +	.set = rw_queue_count_set,
> +	.get = param_get_uint,
> +};
> +
> +static unsigned int rw_queues;
> +module_param_cb(rw_queues, &rw_queue_count_ops, &rw_queues, 0644);
> +MODULE_PARM_DESC(rw_queues,
> +		 "Number of interrupt driven I/O queues used for rw. Default value is nr_cpus");
> +
> +static int read_queue_count_set(const char *val, const struct kernel_param *kp)
> +{
> +	return param_set_uint_minmax(val, kp, UFS_MCQ_MIN_READ_QUEUES,
> +				     num_possible_cpus());
> +}
> +
> +static const struct kernel_param_ops read_queue_count_ops = {
> +	.set = read_queue_count_set,
> +	.get = param_get_uint,
> +};
> +
> +static unsigned int read_queues;
> +module_param_cb(read_queues, &read_queue_count_ops, &read_queues, 0644);
> +MODULE_PARM_DESC(read_queues,
> +		 "Number of interrupt driven read queues used for read. Default value is 0");
> +
> +static int poll_queue_count_set(const char *val, const struct kernel_param *kp)
> +{
> +	return param_set_uint_minmax(val, kp, UFS_MCQ_MIN_POLL_QUEUES,
> +				     num_possible_cpus());
> +}
> +
> +static const struct kernel_param_ops poll_queue_count_ops = {
> +	.set = poll_queue_count_set,
> +	.get = param_get_uint,
> +};
> +
> +static unsigned int poll_queues = 1;
> +module_param_cb(poll_queues, &poll_queue_count_ops, &poll_queues, 0644);
> +MODULE_PARM_DESC(poll_queues,
> +		 "Number of poll queues used for r/w. Default value is 1");
> +
> +static int ufshcd_mcq_config_nr_queues(struct ufs_hba *hba)
> +{
> +	int i;
> +	u32 hba_maxq, rem, tot_queues;
> +	struct Scsi_Host *host = hba->host;
> +
> +	hba_maxq = FIELD_GET(GENMASK(7, 0), hba->mcq_capabilities);

It'd be good to add a definition for GENMASK(7, 0).

> +
> +	tot_queues = UFS_MCQ_NUM_DEV_CMD_QUEUES + read_queues + poll_queues +
> +			rw_queues;
> +
> +	if (hba_maxq < tot_queues) {
> +		dev_err(hba->dev, "Total queues (%d) exceeds HC capacity (%d)\n",
> +			tot_queues, hba_maxq);
> +		return -EOPNOTSUPP;
> +	}
> +
> +	rem = hba_maxq - UFS_MCQ_NUM_DEV_CMD_QUEUES;
> +
> +	if (rw_queues) {
> +		hba->nr_queues[HCTX_TYPE_DEFAULT] = rw_queues;
> +		rem -= hba->nr_queues[HCTX_TYPE_DEFAULT];
> +	} else {
> +		rw_queues = num_possible_cpus();
> +	}
> +
> +	if (poll_queues) {
> +		hba->nr_queues[HCTX_TYPE_POLL] = poll_queues;
> +		rem -= hba->nr_queues[HCTX_TYPE_POLL];
> +	}
> +
> +	if (read_queues) {
> +		hba->nr_queues[HCTX_TYPE_READ] = read_queues;
> +		rem -= hba->nr_queues[HCTX_TYPE_READ];
> +	}
> +
> +	if (!hba->nr_queues[HCTX_TYPE_DEFAULT])
> +		hba->nr_queues[HCTX_TYPE_DEFAULT] = min3(rem, rw_queues,
> +							 num_possible_cpus());
> +
> +	for (i = 0; i < HCTX_MAX_TYPES; i++)
> +		host->nr_hw_queues += hba->nr_queues[i];
> +
> +	hba->nr_hw_queues = host->nr_hw_queues + UFS_MCQ_NUM_DEV_CMD_QUEUES;
> +	return 0;
> +}
> +
> +int ufshcd_mcq_init(struct ufs_hba *hba)
> +{
> +	int ret;
> +
> +	ret = ufshcd_mcq_config_nr_queues(hba);
> +
> +	return ret;
> +}
> +
> diff --git a/drivers/ufs/core/ufshcd-priv.h b/drivers/ufs/core/ufshcd-priv.h
> index a9e8e1f..9368ba2 100644
> --- a/drivers/ufs/core/ufshcd-priv.h
> +++ b/drivers/ufs/core/ufshcd-priv.h
> @@ -61,6 +61,7 @@ int ufshcd_query_attr(struct ufs_hba *hba, enum query_opcode opcode,
>  int ufshcd_query_flag(struct ufs_hba *hba, enum query_opcode opcode,
>  	enum flag_idn idn, u8 index, bool *flag_res);
>  void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit);
> +int ufshcd_mcq_init(struct ufs_hba *hba);
>  
>  #define SD_ASCII_STD true
>  #define SD_RAW false
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 42c49ce..0c4cd8f 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -8196,6 +8196,11 @@ static int ufshcd_add_lus(struct ufs_hba *hba)
>  	return ret;
>  }
>  
> +static int ufshcd_alloc_mcq(struct ufs_hba *hba)
> +{
> +	return ufshcd_mcq_init(hba);
> +}
> +
>  /**
>   * ufshcd_probe_hba - probe hba to detect device and initialize it
>   * @hba: per-adapter instance
> @@ -8245,6 +8250,13 @@ static int ufshcd_probe_hba(struct ufs_hba *hba, bool init_dev_params)
>  			goto out;
>  
>  		if (is_mcq_supported(hba)) {
> +			ret = ufshcd_alloc_mcq(hba);
> +			if (ret) {
> +				/* Continue with SDB mode */
> +				use_mcq_mode = false;
> +				dev_err(hba->dev, "MCQ mode is disabled, err=%d\n",
> +					 ret);
> +			}
>  			ret = scsi_add_host(host, hba->dev);
>  			if (ret) {
>  				dev_err(hba->dev, "scsi_add_host failed\n");
> diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
> index 70c0f9f..dee0b37 100644
> --- a/include/ufs/ufshcd.h
> +++ b/include/ufs/ufshcd.h
> @@ -833,6 +833,8 @@ struct ufs_hba_monitor {
>   *	ufshcd_resume_complete()
>   * @ext_iid_sup: is EXT_IID is supported by UFSHC
>   * @mcq_sup: is mcq supported by UFSHC
> + * @nr_hw_queues: number of hardware queues configured
> + * @nr_queues: number of Queues of different queue types
>   */
>  struct ufs_hba {
>  	void __iomem *mmio_base;
> @@ -984,6 +986,8 @@ struct ufs_hba {
>  	bool complete_put;
>  	bool ext_iid_sup;
>  	bool mcq_sup;
> +	unsigned int nr_hw_queues;
> +	unsigned int nr_queues[HCTX_MAX_TYPES];

Can these two members added before bool types to avoid any holes?

Thanks,
Mani

>  };
>  
>  /* Returns true if clocks can be gated. Otherwise false */
> -- 
> 2.7.4
>
Manivannan Sadhasivam Nov. 28, 2022, 3:15 p.m. UTC | #5
On Tue, Nov 22, 2022 at 08:10:20PM -0800, Asutosh Das wrote:
> The ufs device defines the supported queuedepth by
> bqueuedepth which has a max value of 256.
> The HC defines MAC (Max Active Commands) that define
> the max number of commands that in flight to the ufs
> device.
> Calculate and configure the nutrs based on both these
> values.
> 
> Co-developed-by: Can Guo <quic_cang@quicinc.com>
> Signed-off-by: Can Guo <quic_cang@quicinc.com>
> Signed-off-by: Asutosh Das <quic_asutoshd@quicinc.com>
> ---
>  drivers/ufs/core/ufs-mcq.c     | 32 ++++++++++++++++++++++++++++++++
>  drivers/ufs/core/ufshcd-priv.h |  9 +++++++++
>  drivers/ufs/core/ufshcd.c      | 17 ++++++++++++++++-
>  drivers/ufs/host/ufs-qcom.c    |  8 ++++++++
>  include/ufs/ufs.h              |  2 ++
>  include/ufs/ufshcd.h           |  2 ++
>  include/ufs/ufshci.h           |  1 +
>  7 files changed, 70 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c
> index 4aaa6aa..e95f748 100644
> --- a/drivers/ufs/core/ufs-mcq.c
> +++ b/drivers/ufs/core/ufs-mcq.c
> @@ -18,6 +18,8 @@
>  #define UFS_MCQ_NUM_DEV_CMD_QUEUES 1
>  #define UFS_MCQ_MIN_POLL_QUEUES 0
>  
> +#define MAX_DEV_CMD_ENTRIES	2
> +#define MCQ_CFG_MAC_MASK	GENMASK(16, 8)
>  #define MCQ_QCFGPTR_MASK	GENMASK(7, 0)
>  #define MCQ_QCFGPTR_UNIT	0x200
>  #define MCQ_SQATTR_OFFSET(c) \
> @@ -88,6 +90,36 @@ static const struct ufshcd_res_info ufs_res_info[RES_MAX] = {
>  	{.name = "mcq_vs",},
>  };
>  
> +/**
> + * ufshcd_mcq_decide_queue_depth - decide the queue depth
> + * @hba - per adapter instance
> + *

Kernel doc should define the return value also.

> + * MAC - Max. Active Command of the Host Controller (HC)
> + * HC wouldn't send more than this commands to the device.
> + * It is mandatory to implement get_hba_mac() to enable MCQ mode.
> + * Calculates and adjusts the queue depth based on the depth
> + * supported by the HC and ufs device.
> + */
> +int ufshcd_mcq_decide_queue_depth(struct ufs_hba *hba)
> +{
> +	int mac;
> +
> +	/* Mandatory to implement get_hba_mac() */
> +	mac = ufshcd_mcq_vops_get_hba_mac(hba);
> +	if (mac < 0) {
> +		dev_err(hba->dev, "Failed to get mac, err=%d\n", mac);
> +		return mac;
> +	}
> +
> +	WARN_ON(!hba->dev_info.bqueuedepth);

Instead of panic, you could just print and return an error.

> +	/*
> +	 * max. value of bqueuedepth = 256, mac is host dependent.
> +	 * It is mandatory for UFS device to define bQueueDepth if
> +	 * shared queuing architecture is enabled.
> +	 */
> +	return min_t(int, mac, hba->dev_info.bqueuedepth);
> +}
> +
>  static int ufshcd_mcq_config_resource(struct ufs_hba *hba)
>  {
>  	struct platform_device *pdev = to_platform_device(hba->dev);
> diff --git a/drivers/ufs/core/ufshcd-priv.h b/drivers/ufs/core/ufshcd-priv.h
> index 9368ba2..9f40fa5 100644
> --- a/drivers/ufs/core/ufshcd-priv.h
> +++ b/drivers/ufs/core/ufshcd-priv.h
> @@ -62,6 +62,7 @@ int ufshcd_query_flag(struct ufs_hba *hba, enum query_opcode opcode,
>  	enum flag_idn idn, u8 index, bool *flag_res);
>  void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit);
>  int ufshcd_mcq_init(struct ufs_hba *hba);
> +int ufshcd_mcq_decide_queue_depth(struct ufs_hba *hba);
>  
>  #define SD_ASCII_STD true
>  #define SD_RAW false
> @@ -227,6 +228,14 @@ static inline void ufshcd_vops_config_scaling_param(struct ufs_hba *hba,
>  		hba->vops->config_scaling_param(hba, p, data);
>  }
>  
> +static inline int ufshcd_mcq_vops_get_hba_mac(struct ufs_hba *hba)

Again, no inline please.

> +{
> +	if (hba->vops && hba->vops->get_hba_mac)
> +		return hba->vops->get_hba_mac(hba);
> +
> +	return -EOPNOTSUPP;
> +}
> +
>  extern const struct ufs_pm_lvl_states ufs_pm_lvl_states[];
>  
>  /**
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 0c4cd8f..ae065da 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -7783,6 +7783,7 @@ static int ufs_get_device_desc(struct ufs_hba *hba)
>  	/* getting Specification Version in big endian format */
>  	dev_info->wspecversion = desc_buf[DEVICE_DESC_PARAM_SPEC_VER] << 8 |
>  				      desc_buf[DEVICE_DESC_PARAM_SPEC_VER + 1];
> +	dev_info->bqueuedepth = desc_buf[DEVICE_DESC_PARAM_Q_DPTH];
>  	b_ufs_feature_sup = desc_buf[DEVICE_DESC_PARAM_UFS_FEAT];
>  
>  	model_index = desc_buf[DEVICE_DESC_PARAM_PRDCT_NAME];
> @@ -8198,7 +8199,21 @@ static int ufshcd_add_lus(struct ufs_hba *hba)
>  
>  static int ufshcd_alloc_mcq(struct ufs_hba *hba)
>  {
> -	return ufshcd_mcq_init(hba);
> +	int ret;
> +	int old_nutrs = hba->nutrs;
> +
> +	ret = ufshcd_mcq_decide_queue_depth(hba);
> +	if (ret < 0)
> +		return ret;
> +
> +	hba->nutrs = ret;
> +	ret = ufshcd_mcq_init(hba);
> +	if (ret) {
> +		hba->nutrs = old_nutrs;
> +		return ret;
> +	}
> +
> +	return 0;
>  }
>  
>  /**
> diff --git a/drivers/ufs/host/ufs-qcom.c b/drivers/ufs/host/ufs-qcom.c
> index 8ad1415..7bd3c37 100644
> --- a/drivers/ufs/host/ufs-qcom.c
> +++ b/drivers/ufs/host/ufs-qcom.c
> @@ -25,6 +25,7 @@
>  #define UFS_QCOM_DEFAULT_DBG_PRINT_EN	\
>  	(UFS_QCOM_DBG_PRINT_REGS_EN | UFS_QCOM_DBG_PRINT_TEST_BUS_EN)
>  
> +#define MAX_SUPP_MAC 64

Similar definitions are part of ufs-qcom.h.

Thanks,
Mani

>  enum {
>  	TSTBUS_UAWM,
>  	TSTBUS_UARM,
> @@ -1424,6 +1425,12 @@ static void ufs_qcom_config_scaling_param(struct ufs_hba *hba,
>  }
>  #endif
>  
> +static int ufs_qcom_get_hba_mac(struct ufs_hba *hba)
> +{
> +	/* Qualcomm HC supports up to 64 */
> +	return MAX_SUPP_MAC;
> +}
> +
>  /*
>   * struct ufs_hba_qcom_vops - UFS QCOM specific variant operations
>   *
> @@ -1447,6 +1454,7 @@ static const struct ufs_hba_variant_ops ufs_hba_qcom_vops = {
>  	.device_reset		= ufs_qcom_device_reset,
>  	.config_scaling_param = ufs_qcom_config_scaling_param,
>  	.program_key		= ufs_qcom_ice_program_key,
> +	.get_hba_mac		= ufs_qcom_get_hba_mac,
>  };
>  
>  /**
> diff --git a/include/ufs/ufs.h b/include/ufs/ufs.h
> index ba2a1d8..5112418 100644
> --- a/include/ufs/ufs.h
> +++ b/include/ufs/ufs.h
> @@ -591,6 +591,8 @@ struct ufs_dev_info {
>  	u8	*model;
>  	u16	wspecversion;
>  	u32	clk_gating_wait_us;
> +	/* Stores the depth of queue in UFS device */
> +	u8	bqueuedepth;
>  
>  	/* UFS HPB related flag */
>  	bool	hpb_enabled;
> diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
> index 7bf7599..e03b310 100644
> --- a/include/ufs/ufshcd.h
> +++ b/include/ufs/ufshcd.h
> @@ -297,6 +297,7 @@ struct ufs_pwr_mode_info {
>   * @config_scaling_param: called to configure clock scaling parameters
>   * @program_key: program or evict an inline encryption key
>   * @event_notify: called to notify important events
> + * @get_hba_mac: called to get vendor specific mac value, mandatory for mcq mode
>   */
>  struct ufs_hba_variant_ops {
>  	const char *name;
> @@ -335,6 +336,7 @@ struct ufs_hba_variant_ops {
>  			       const union ufs_crypto_cfg_entry *cfg, int slot);
>  	void	(*event_notify)(struct ufs_hba *hba,
>  				enum ufs_event_type evt, void *data);
> +	int	(*get_hba_mac)(struct ufs_hba *hba);
>  };
>  
>  /* clock gating state  */
> diff --git a/include/ufs/ufshci.h b/include/ufs/ufshci.h
> index 4d4da06..67fcebd 100644
> --- a/include/ufs/ufshci.h
> +++ b/include/ufs/ufshci.h
> @@ -57,6 +57,7 @@ enum {
>  	REG_UFS_CCAP				= 0x100,
>  	REG_UFS_CRYPTOCAP			= 0x104,
>  
> +	REG_UFS_MCQ_CFG				= 0x380,
>  	UFSHCI_CRYPTO_REG_SPACE_SIZE		= 0x400,
>  };
>  
> -- 
> 2.7.4
>
Manivannan Sadhasivam Nov. 28, 2022, 3:59 p.m. UTC | #6
On Tue, Nov 22, 2022 at 08:10:23PM -0800, Asutosh Das wrote:
> Enable shared tags for MCQ. For UFS, this should
> not have a huge performance impact. It however
> simplifies the MCQ implementation and reuses most of
> the existing code in the issue and completion path.
> Also add multiple queue mapping to map_queue().
> 
> Co-developed-by: Can Guo <quic_cang@quicinc.com>
> Signed-off-by: Can Guo <quic_cang@quicinc.com>
> Signed-off-by: Asutosh Das <quic_asutoshd@quicinc.com>

Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>

Thanks,
Mani

> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/ufs/core/ufs-mcq.c |  2 ++
>  drivers/ufs/core/ufshcd.c  | 28 ++++++++++++++++------------
>  2 files changed, 18 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c
> index ebecc47..e4ddb90 100644
> --- a/drivers/ufs/core/ufs-mcq.c
> +++ b/drivers/ufs/core/ufs-mcq.c
> @@ -376,6 +376,7 @@ void ufshcd_mcq_make_queues_operational(struct ufs_hba *hba)
>  
>  int ufshcd_mcq_init(struct ufs_hba *hba)
>  {
> +	struct Scsi_Host *host = hba->host;
>  	struct ufs_hw_queue *hwq;
>  	int ret, i;
>  
> @@ -411,6 +412,7 @@ int ufshcd_mcq_init(struct ufs_hba *hba)
>  	/* Give dev_cmd_queue the minimal number of entries */
>  	hba->dev_cmd_queue->max_entries = MAX_DEV_CMD_ENTRIES;
>  
> +	host->host_tagset = 1;
>  	return 0;
>  }
>  
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 042ecf04..d61e99f 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -2763,24 +2763,28 @@ static inline bool is_device_wlun(struct scsi_device *sdev)
>   */
>  static void ufshcd_map_queues(struct Scsi_Host *shost)
>  {
> -	int i;
> +	struct ufs_hba *hba = shost_priv(shost);
> +	int i, queue_offset = 0;
> +
> +	if (!is_mcq_supported(hba)) {
> +		hba->nr_queues[HCTX_TYPE_DEFAULT] = 1;
> +		hba->nr_queues[HCTX_TYPE_READ] = 0;
> +		hba->nr_queues[HCTX_TYPE_POLL] = 1;
> +		hba->nr_hw_queues = 1;
> +	}
>  
>  	for (i = 0; i < shost->nr_maps; i++) {
>  		struct blk_mq_queue_map *map = &shost->tag_set.map[i];
>  
> -		switch (i) {
> -		case HCTX_TYPE_DEFAULT:
> -		case HCTX_TYPE_POLL:
> -			map->nr_queues = 1;
> -			break;
> -		case HCTX_TYPE_READ:
> -			map->nr_queues = 0;
> +		map->nr_queues = hba->nr_queues[i];
> +		if (!map->nr_queues)
>  			continue;
> -		default:
> -			WARN_ON_ONCE(true);
> -		}
> -		map->queue_offset = 0;
> +		map->queue_offset = queue_offset;
> +		if (i == HCTX_TYPE_POLL && !is_mcq_supported(hba))
> +			map->queue_offset = 0;
> +
>  		blk_mq_map_queues(map);
> +		queue_offset += map->nr_queues;
>  	}
>  }
>  
> -- 
> 2.7.4
>
Manivannan Sadhasivam Nov. 28, 2022, 4:08 p.m. UTC | #7
On Tue, Nov 22, 2022 at 08:10:25PM -0800, Asutosh Das wrote:
> Adds support to find the hardware queue on which the request
> would be queued.
> Since the very first queue is to serve device commands, an offset
> of 1 is added to the index of the hardware queue.
> 
> Co-developed-by: Can Guo <quic_cang@quicinc.com>
> Signed-off-by: Can Guo <quic_cang@quicinc.com>
> Signed-off-by: Asutosh Das <quic_asutoshd@quicinc.com>

One small nitpick below...

Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>

> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/ufs/core/ufs-mcq.c     | 20 ++++++++++++++++++++
>  drivers/ufs/core/ufshcd-priv.h |  3 +++
>  drivers/ufs/core/ufshcd.c      |  3 +++
>  3 files changed, 26 insertions(+)
> 
> diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c
> index 7179f86..10a0d0d7 100644
> --- a/drivers/ufs/core/ufs-mcq.c
> +++ b/drivers/ufs/core/ufs-mcq.c
> @@ -94,6 +94,26 @@ static const struct ufshcd_res_info ufs_res_info[RES_MAX] = {
>  };
>  
>  /**
> + * ufshcd_mcq_req_to_hwq - find the hardware queue on which the
> + * request would be issued.
> + * @hba - per adapter instance
> + * @req - pointer to the request to be issued
> + *
> + * Returns the hardware queue instance on which the request would
> + * be queued.
> + */
> +struct ufs_hw_queue *ufshcd_mcq_req_to_hwq(struct ufs_hba *hba,
> +					 struct request *req)
> +{
> +	u32 utag = blk_mq_unique_tag(req);
> +	u32 hwq = blk_mq_unique_tag_to_hwq(utag);
> +
> +	/* uhq[0] is used to serve device commands */
> +	return &hba->uhq[hwq + UFSHCD_MCQ_IO_QUEUE_OFFSET];
> +}
> +

Remove extra newline.

Thanks,
Mani

> +
> +/**
>   * ufshcd_mcq_decide_queue_depth - decide the queue depth
>   * @hba - per adapter instance
>   *
> diff --git a/drivers/ufs/core/ufshcd-priv.h b/drivers/ufs/core/ufshcd-priv.h
> index 5616047..14df7ce 100644
> --- a/drivers/ufs/core/ufshcd-priv.h
> +++ b/drivers/ufs/core/ufshcd-priv.h
> @@ -67,7 +67,10 @@ int ufshcd_mcq_memory_alloc(struct ufs_hba *hba);
>  void ufshcd_mcq_make_queues_operational(struct ufs_hba *hba);
>  void ufshcd_mcq_config_mac(struct ufs_hba *hba, u32 max_active_cmds);
>  void ufshcd_mcq_select_mcq_mode(struct ufs_hba *hba);
> +struct ufs_hw_queue *ufshcd_mcq_req_to_hwq(struct ufs_hba *hba,
> +					   struct request *req);
>  
> +#define UFSHCD_MCQ_IO_QUEUE_OFFSET	1
>  #define SD_ASCII_STD true
>  #define SD_RAW false
>  int ufshcd_read_string_desc(struct ufs_hba *hba, u8 desc_index,
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 93a9e38..52c0386 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -2921,6 +2921,9 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
>  		goto out;
>  	}
>  
> +	if (is_mcq_enabled(hba))
> +		hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(cmd));
> +
>  	ufshcd_send_command(hba, tag, hwq);
>  
>  out:
> -- 
> 2.7.4
>
Manivannan Sadhasivam Nov. 28, 2022, 4:11 p.m. UTC | #8
On Tue, Nov 22, 2022 at 08:10:26PM -0800, Asutosh Das wrote:
> Modify completion path APIs and add completion queue
> entry.
> 
> Co-developed-by: Can Guo <quic_cang@quicinc.com>
> Signed-off-by: Can Guo <quic_cang@quicinc.com>
> Signed-off-by: Asutosh Das <quic_asutoshd@quicinc.com>

Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>

Thanks,
Mani

> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/ufs/core/ufshcd-priv.h |  2 ++
>  drivers/ufs/core/ufshcd.c      | 80 ++++++++++++++++++++++++++----------------
>  2 files changed, 51 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/ufs/core/ufshcd-priv.h b/drivers/ufs/core/ufshcd-priv.h
> index 14df7ce..6453449 100644
> --- a/drivers/ufs/core/ufshcd-priv.h
> +++ b/drivers/ufs/core/ufshcd-priv.h
> @@ -61,6 +61,8 @@ int ufshcd_query_attr(struct ufs_hba *hba, enum query_opcode opcode,
>  int ufshcd_query_flag(struct ufs_hba *hba, enum query_opcode opcode,
>  	enum flag_idn idn, u8 index, bool *flag_res);
>  void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit);
> +void ufshcd_compl_one_cqe(struct ufs_hba *hba, int task_tag,
> +			  struct cq_entry *cqe);
>  int ufshcd_mcq_init(struct ufs_hba *hba);
>  int ufshcd_mcq_decide_queue_depth(struct ufs_hba *hba);
>  int ufshcd_mcq_memory_alloc(struct ufs_hba *hba);
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 52c0386..f16d02c 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -784,12 +784,17 @@ static inline bool ufshcd_is_device_present(struct ufs_hba *hba)
>  /**
>   * ufshcd_get_tr_ocs - Get the UTRD Overall Command Status
>   * @lrbp: pointer to local command reference block
> + * @cqe: pointer to the completion queue entry
>   *
>   * This function is used to get the OCS field from UTRD
>   * Returns the OCS field in the UTRD
>   */
> -static enum utp_ocs ufshcd_get_tr_ocs(struct ufshcd_lrb *lrbp)
> +static enum utp_ocs ufshcd_get_tr_ocs(struct ufshcd_lrb *lrbp,
> +				      struct cq_entry *cqe)
>  {
> +	if (cqe)
> +		return le32_to_cpu(cqe->status) & MASK_OCS;
> +
>  	return le32_to_cpu(lrbp->utr_descriptor_ptr->header.dword_2) & MASK_OCS;
>  }
>  
> @@ -3048,7 +3053,7 @@ static int ufshcd_wait_for_dev_cmd(struct ufs_hba *hba,
>  		 * not trigger any race conditions.
>  		 */
>  		hba->dev_cmd.complete = NULL;
> -		err = ufshcd_get_tr_ocs(lrbp);
> +		err = ufshcd_get_tr_ocs(lrbp, hba->dev_cmd.cqe);
>  		if (!err)
>  			err = ufshcd_dev_cmd_completion(hba, lrbp);
>  	} else {
> @@ -5214,18 +5219,20 @@ ufshcd_scsi_cmd_status(struct ufshcd_lrb *lrbp, int scsi_status)
>   * ufshcd_transfer_rsp_status - Get overall status of the response
>   * @hba: per adapter instance
>   * @lrbp: pointer to local reference block of completed command
> + * @cqe: pointer to the completion queue entry
>   *
>   * Returns result of the command to notify SCSI midlayer
>   */
>  static inline int
> -ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
> +ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp,
> +			   struct cq_entry *cqe)
>  {
>  	int result = 0;
>  	int scsi_status;
>  	enum utp_ocs ocs;
>  
>  	/* overall command status of utrd */
> -	ocs = ufshcd_get_tr_ocs(lrbp);
> +	ocs = ufshcd_get_tr_ocs(lrbp, cqe);
>  
>  	if (hba->quirks & UFSHCD_QUIRK_BROKEN_OCS_FATAL_ERROR) {
>  		if (be32_to_cpu(lrbp->ucd_rsp_ptr->header.dword_1) &
> @@ -5390,42 +5397,53 @@ static void ufshcd_release_scsi_cmd(struct ufs_hba *hba,
>  }
>  
>  /**
> - * __ufshcd_transfer_req_compl - handle SCSI and query command completion
> + * ufshcd_compl_one_cqe - handle a completion queue entry
>   * @hba: per adapter instance
> - * @completed_reqs: bitmask that indicates which requests to complete
> + * @task_tag: the task tag of the request to be completed
> + * @cqe: pointer to the completion queue entry
>   */
> -static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
> -					unsigned long completed_reqs)
> +void ufshcd_compl_one_cqe(struct ufs_hba *hba, int task_tag,
> +			  struct cq_entry *cqe)
>  {
>  	struct ufshcd_lrb *lrbp;
>  	struct scsi_cmnd *cmd;
> -	int index;
> -
> -	for_each_set_bit(index, &completed_reqs, hba->nutrs) {
> -		lrbp = &hba->lrb[index];
> -		lrbp->compl_time_stamp = ktime_get();
> -		lrbp->compl_time_stamp_local_clock = local_clock();
> -		cmd = lrbp->cmd;
> -		if (cmd) {
> -			if (unlikely(ufshcd_should_inform_monitor(hba, lrbp)))
> -				ufshcd_update_monitor(hba, lrbp);
> -			ufshcd_add_command_trace(hba, index, UFS_CMD_COMP);
> -			cmd->result = ufshcd_transfer_rsp_status(hba, lrbp);
> -			ufshcd_release_scsi_cmd(hba, lrbp);
> -			/* Do not touch lrbp after scsi done */
> -			scsi_done(cmd);
> -		} else if (lrbp->command_type == UTP_CMD_TYPE_DEV_MANAGE ||
> -			lrbp->command_type == UTP_CMD_TYPE_UFS_STORAGE) {
> -			if (hba->dev_cmd.complete) {
> -				ufshcd_add_command_trace(hba, index,
> -							 UFS_DEV_COMP);
> -				complete(hba->dev_cmd.complete);
> -				ufshcd_clk_scaling_update_busy(hba);
> -			}
> +
> +	lrbp = &hba->lrb[task_tag];
> +	lrbp->compl_time_stamp = ktime_get();
> +	cmd = lrbp->cmd;
> +	if (cmd) {
> +		if (unlikely(ufshcd_should_inform_monitor(hba, lrbp)))
> +			ufshcd_update_monitor(hba, lrbp);
> +		ufshcd_add_command_trace(hba, task_tag, UFS_CMD_COMP);
> +		cmd->result = ufshcd_transfer_rsp_status(hba, lrbp, cqe);
> +		ufshcd_release_scsi_cmd(hba, lrbp);
> +		/* Do not touch lrbp after scsi done */
> +		scsi_done(cmd);
> +	} else if (lrbp->command_type == UTP_CMD_TYPE_DEV_MANAGE ||
> +		   lrbp->command_type == UTP_CMD_TYPE_UFS_STORAGE) {
> +		if (hba->dev_cmd.complete) {
> +			hba->dev_cmd.cqe = cqe;
> +			ufshcd_add_command_trace(hba, task_tag, UFS_DEV_COMP);
> +			complete(hba->dev_cmd.complete);
> +			ufshcd_clk_scaling_update_busy(hba);
>  		}
>  	}
>  }
>  
> +/**
> + * __ufshcd_transfer_req_compl - handle SCSI and query command completion
> + * @hba: per adapter instance
> + * @completed_reqs: bitmask that indicates which requests to complete
> + */
> +static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
> +					unsigned long completed_reqs)
> +{
> +	int tag;
> +
> +	for_each_set_bit(tag, &completed_reqs, hba->nutrs)
> +		ufshcd_compl_one_cqe(hba, tag, NULL);
> +}
> +
>  /*
>   * Returns > 0 if one or more commands have been completed or 0 if no
>   * requests have been completed.
> -- 
> 2.7.4
>
Manivannan Sadhasivam Nov. 28, 2022, 5:02 p.m. UTC | #9
On Tue, Nov 22, 2022 at 08:10:28PM -0800, Asutosh Das wrote:
> Complete cqe requests in poll. Assumption is that
> several poll completion may happen in different CPUs
> for the same completion queue. Hence a spin lock
> protection is added.
> 
> Co-developed-by: Can Guo <quic_cang@quicinc.com>
> Signed-off-by: Can Guo <quic_cang@quicinc.com>
> Signed-off-by: Asutosh Das <quic_asutoshd@quicinc.com>

Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>

Thanks,
Mani

> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/ufs/core/ufs-mcq.c     | 13 +++++++++++++
>  drivers/ufs/core/ufshcd-priv.h |  2 ++
>  drivers/ufs/core/ufshcd.c      |  7 +++++++
>  include/ufs/ufshcd.h           |  2 ++
>  4 files changed, 24 insertions(+)
> 
> diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c
> index 365ad98..5311857 100644
> --- a/drivers/ufs/core/ufs-mcq.c
> +++ b/drivers/ufs/core/ufs-mcq.c
> @@ -387,6 +387,18 @@ unsigned long ufshcd_mcq_poll_cqe_nolock(struct ufs_hba *hba,
>  	return completed_reqs;
>  }
>  
> +unsigned long ufshcd_mcq_poll_cqe_lock(struct ufs_hba *hba,
> +				       struct ufs_hw_queue *hwq)
> +{
> +	unsigned long completed_reqs;
> +
> +	spin_lock(&hwq->cq_lock);
> +	completed_reqs = ufshcd_mcq_poll_cqe_nolock(hba, hwq);
> +	spin_unlock(&hwq->cq_lock);
> +
> +	return completed_reqs;
> +}
> +
>  void ufshcd_mcq_make_queues_operational(struct ufs_hba *hba)
>  {
>  	struct ufs_hw_queue *hwq;
> @@ -483,6 +495,7 @@ int ufshcd_mcq_init(struct ufs_hba *hba)
>  		hwq = &hba->uhq[i];
>  		hwq->max_entries = hba->nutrs;
>  		spin_lock_init(&hwq->sq_lock);
> +		spin_lock_init(&hwq->cq_lock);
>  	}
>  
>  	/* The very first HW queue serves device commands */
> diff --git a/drivers/ufs/core/ufshcd-priv.h b/drivers/ufs/core/ufshcd-priv.h
> index c5b5bf3..73ce8a2 100644
> --- a/drivers/ufs/core/ufshcd-priv.h
> +++ b/drivers/ufs/core/ufshcd-priv.h
> @@ -75,6 +75,8 @@ unsigned long ufshcd_mcq_poll_cqe_nolock(struct ufs_hba *hba,
>  					 struct ufs_hw_queue *hwq);
>  struct ufs_hw_queue *ufshcd_mcq_req_to_hwq(struct ufs_hba *hba,
>  					   struct request *req);
> +unsigned long ufshcd_mcq_poll_cqe_lock(struct ufs_hba *hba,
> +				       struct ufs_hw_queue *hwq);
>  
>  #define UFSHCD_MCQ_IO_QUEUE_OFFSET	1
>  #define SD_ASCII_STD true
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 7fb7c5f..8416d42 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -5453,6 +5453,13 @@ static int ufshcd_poll(struct Scsi_Host *shost, unsigned int queue_num)
>  	struct ufs_hba *hba = shost_priv(shost);
>  	unsigned long completed_reqs, flags;
>  	u32 tr_doorbell;
> +	struct ufs_hw_queue *hwq;
> +
> +	if (is_mcq_enabled(hba)) {
> +		hwq = &hba->uhq[queue_num + UFSHCD_MCQ_IO_QUEUE_OFFSET];
> +
> +		return ufshcd_mcq_poll_cqe_lock(hba, hwq);
> +	}
>  
>  	spin_lock_irqsave(&hba->outstanding_lock, flags);
>  	tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
> diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
> index d5fde64..a709391 100644
> --- a/include/ufs/ufshcd.h
> +++ b/include/ufs/ufshcd.h
> @@ -1069,6 +1069,7 @@ struct ufs_hba {
>   * @sq_lock: serialize submission queue access
>   * @cq_tail_slot: current slot to which CQ tail pointer is pointing
>   * @cq_head_slot: current slot to which CQ head pointer is pointing
> + * @cq_lock: Synchronize between multiple polling instances
>   */
>  struct ufs_hw_queue {
>  	void __iomem *mcq_sq_head;
> @@ -1086,6 +1087,7 @@ struct ufs_hw_queue {
>  	spinlock_t sq_lock;
>  	u32 cq_tail_slot;
>  	u32 cq_head_slot;
> +	spinlock_t cq_lock;
>  };
>  
>  static inline bool is_mcq_enabled(struct ufs_hba *hba)
> -- 
> 2.7.4
>
Asutosh Das Nov. 28, 2022, 7:54 p.m. UTC | #10
On Mon, Nov 28 2022 at 07:15 -0800, Manivannan Sadhasivam wrote:
>On Tue, Nov 22, 2022 at 08:10:20PM -0800, Asutosh Das wrote:
>> The ufs device defines the supported queuedepth by
>> bqueuedepth which has a max value of 256.
>> The HC defines MAC (Max Active Commands) that define
>> the max number of commands that in flight to the ufs
>> device.
>> Calculate and configure the nutrs based on both these
>> values.
>>
>> Co-developed-by: Can Guo <quic_cang@quicinc.com>
>> Signed-off-by: Can Guo <quic_cang@quicinc.com>
>> Signed-off-by: Asutosh Das <quic_asutoshd@quicinc.com>
>> ---
>>  drivers/ufs/core/ufs-mcq.c     | 32 ++++++++++++++++++++++++++++++++
>>  drivers/ufs/core/ufshcd-priv.h |  9 +++++++++
>>  drivers/ufs/core/ufshcd.c      | 17 ++++++++++++++++-
>>  drivers/ufs/host/ufs-qcom.c    |  8 ++++++++
>>  include/ufs/ufs.h              |  2 ++
>>  include/ufs/ufshcd.h           |  2 ++
>>  include/ufs/ufshci.h           |  1 +
>>  7 files changed, 70 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c
>> index 4aaa6aa..e95f748 100644
>> --- a/drivers/ufs/core/ufs-mcq.c
>> +++ b/drivers/ufs/core/ufs-mcq.c
>> @@ -18,6 +18,8 @@
>>  #define UFS_MCQ_NUM_DEV_CMD_QUEUES 1
>>  #define UFS_MCQ_MIN_POLL_QUEUES 0
>>
>> +#define MAX_DEV_CMD_ENTRIES	2
>> +#define MCQ_CFG_MAC_MASK	GENMASK(16, 8)
>>  #define MCQ_QCFGPTR_MASK	GENMASK(7, 0)
>>  #define MCQ_QCFGPTR_UNIT	0x200
>>  #define MCQ_SQATTR_OFFSET(c) \
>> @@ -88,6 +90,36 @@ static const struct ufshcd_res_info ufs_res_info[RES_MAX] = {
>>  	{.name = "mcq_vs",},
>>  };
[...]

Hello Mani,
Thanks for taking a look.

>> +	WARN_ON(!hba->dev_info.bqueuedepth);
>
>Instead of panic, you could just print and return an error.
>
I'd make it WARN_ON_ONCE();

>> +	/*
>> +	 * max. value of bqueuedepth = 256, mac is host dependent.
>> +	 * It is mandatory for UFS device to define bQueueDepth if
>> +	 * shared queuing architecture is enabled.
>> +	 */
>> +	return min_t(int, mac, hba->dev_info.bqueuedepth);
>> +}
>> +
>>  static int ufshcd_mcq_config_resource(struct ufs_hba *hba)
>>  {
>>  	struct platform_device *pdev = to_platform_device(hba->dev);
>> diff --git a/drivers/ufs/core/ufshcd-priv.h b/drivers/ufs/core/ufshcd-priv.h
>> index 9368ba2..9f40fa5 100644
>> --- a/drivers/ufs/core/ufshcd-priv.h
>> +++ b/drivers/ufs/core/ufshcd-priv.h
>> @@ -62,6 +62,7 @@ int ufshcd_query_flag(struct ufs_hba *hba, enum query_opcode opcode,
>>  	enum flag_idn idn, u8 index, bool *flag_res);
>>  void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit);
>>  int ufshcd_mcq_init(struct ufs_hba *hba);
>> +int ufshcd_mcq_decide_queue_depth(struct ufs_hba *hba);
>>
>>  #define SD_ASCII_STD true
>>  #define SD_RAW false
>> @@ -227,6 +228,14 @@ static inline void ufshcd_vops_config_scaling_param(struct ufs_hba *hba,
>>  		hba->vops->config_scaling_param(hba, p, data);
>>  }
>>
>> +static inline int ufshcd_mcq_vops_get_hba_mac(struct ufs_hba *hba)
>
>Again, no inline please.
>
It spits out the following warning for all files that include this header, when
inline is removed:
warning: 'ufshcd_mcq_vops_get_hba_mac' defined but not used [-Wunused-function]

[...]
>> +#define MAX_SUPP_MAC 64
>
>Similar definitions are part of ufs-qcom.h.
>
>Thanks,
>Mani
>
>
>-- 
>மணிவண்ணன் சதாசிவம்
Bart Van Assche Nov. 28, 2022, 8:33 p.m. UTC | #11
On 11/28/22 11:54, Asutosh Das wrote:
> On Mon, Nov 28 2022 at 07:15 -0800, Manivannan Sadhasivam wrote:
>> On Tue, Nov 22, 2022 at 08:10:20PM -0800, Asutosh Das wrote:
>>> +static inline int ufshcd_mcq_vops_get_hba_mac(struct ufs_hba *hba)
>>
>> Again, no inline please.
>>
> It spits out the following warning for all files that include this 
> header, when
> inline is removed:
> warning: 'ufshcd_mcq_vops_get_hba_mac' defined but not used 
> [-Wunused-function]

My understanding is that the "no inline" rule applies to .c files only 
and also that functions defined in header files should be declared 
"static inline".

Thanks,

Bart.