diff mbox series

[2/2] scsi: ufs: Use SYNCHRONIZE CACHE instead of FUA

Message ID 20230201180637.2102556-3-bvanassche@acm.org
State New
Headers show
Series Use SYNCHRONIZE CACHE instead of FUA for UFS devices | expand

Commit Message

Bart Van Assche Feb. 1, 2023, 6:06 p.m. UTC
From: Asutosh Das <asutoshd@codeaurora.org>

UFS devices perform better when using SYNCHRONIZE CACHE command
instead of the FUA flag. Hence this patch.

Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ bvanassche: modified a source code comment ]
---
 drivers/ufs/core/ufshcd.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Daejun Park Feb. 2, 2023, 1:54 a.m. UTC | #1
Hi Bart,
> From: Asutosh Das <asutoshd@codeaurora.org>
> 
> UFS devices perform better when using SYNCHRONIZE CACHE command
> instead of the FUA flag. Hence this patch.

If you have, could you share the result when using SYNCHRONIZE CACHE command?

Thanks,
Daejun

> 
> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> [ bvanassche: modified a source code comment ]
> ---
>  drivers/ufs/core/ufshcd.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index bf3cb12ef02f..461aa51cfccc 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -5056,6 +5056,9 @@ static int ufshcd_slave_alloc(struct scsi_device *sdev)
>          /* WRITE_SAME command is not supported */
>          sdev->no_write_same = 1;
>  
> +        /* Use SYNCHRONIZE CACHE instead of FUA to improve performance */
> +        sdev->sdev_bflags = BLIST_BROKEN_FUA;
> +
>          ufshcd_lu_init(hba, sdev);
>  
>          ufshcd_setup_links(hba, sdev);
>
kernel test robot Feb. 2, 2023, 4:32 a.m. UTC | #2
Hi Bart,

I love your patch! Yet something to improve:

[auto build test ERROR on mkp-scsi/for-next]
[also build test ERROR on jejb-scsi/for-next linus/master v6.2-rc6 next-20230202]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Bart-Van-Assche/scsi-core-Introduce-the-BLIST_BROKEN_FUA-flag/20230202-021019
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next
patch link:    https://lore.kernel.org/r/20230201180637.2102556-3-bvanassche%40acm.org
patch subject: [PATCH 2/2] scsi: ufs: Use SYNCHRONIZE CACHE instead of FUA
config: arm-randconfig-r046-20230130 (https://download.01.org/0day-ci/archive/20230202/202302021202.mM5iFxSO-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/4edde0173805f04faa8e79aab4de3e929ea4b7c0
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Bart-Van-Assche/scsi-core-Introduce-the-BLIST_BROKEN_FUA-flag/20230202-021019
        git checkout 4edde0173805f04faa8e79aab4de3e929ea4b7c0
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash drivers/ufs/core/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/ufs/core/ufshcd.c: In function 'ufshcd_slave_alloc':
>> drivers/ufs/core/ufshcd.c:5060:29: error: 'BLIST_BROKEN_FUA' undeclared (first use in this function)
    5060 |         sdev->sdev_bflags = BLIST_BROKEN_FUA;
         |                             ^~~~~~~~~~~~~~~~
   drivers/ufs/core/ufshcd.c:5060:29: note: each undeclared identifier is reported only once for each function it appears in


vim +/BLIST_BROKEN_FUA +5060 drivers/ufs/core/ufshcd.c

  5031	
  5032	/**
  5033	 * ufshcd_slave_alloc - handle initial SCSI device configurations
  5034	 * @sdev: pointer to SCSI device
  5035	 *
  5036	 * Returns success
  5037	 */
  5038	static int ufshcd_slave_alloc(struct scsi_device *sdev)
  5039	{
  5040		struct ufs_hba *hba;
  5041	
  5042		hba = shost_priv(sdev->host);
  5043	
  5044		/* Mode sense(6) is not supported by UFS, so use Mode sense(10) */
  5045		sdev->use_10_for_ms = 1;
  5046	
  5047		/* DBD field should be set to 1 in mode sense(10) */
  5048		sdev->set_dbd_for_ms = 1;
  5049	
  5050		/* allow SCSI layer to restart the device in case of errors */
  5051		sdev->allow_restart = 1;
  5052	
  5053		/* REPORT SUPPORTED OPERATION CODES is not supported */
  5054		sdev->no_report_opcodes = 1;
  5055	
  5056		/* WRITE_SAME command is not supported */
  5057		sdev->no_write_same = 1;
  5058	
  5059		/* Use SYNCHRONIZE CACHE instead of FUA to improve performance */
> 5060		sdev->sdev_bflags = BLIST_BROKEN_FUA;
  5061	
  5062		ufshcd_lu_init(hba, sdev);
  5063	
  5064		ufshcd_setup_links(hba, sdev);
  5065	
  5066		return 0;
  5067	}
  5068
Adrian Hunter Feb. 2, 2023, 7:52 a.m. UTC | #3
On 1/02/23 20:06, Bart Van Assche wrote:
> From: Asutosh Das <asutoshd@codeaurora.org>
> 
> UFS devices perform better when using SYNCHRONIZE CACHE command
> instead of the FUA flag. Hence this patch.

Hi

It would be nice to get some clarification on what is
going on for this case.

This includes with Data Reliability enabled?

In theory, WRITE+FUA should be at least as fast as
WRITE+SYNCHRONIZE CACHE, right?

Do we have any explanation for why that would not
be true?

In particular, is SYNCHRONIZE CACHE faster because
it is not, in fact, providing Reliable Writes?

> 
> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> [ bvanassche: modified a source code comment ]
> ---
>  drivers/ufs/core/ufshcd.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index bf3cb12ef02f..461aa51cfccc 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -5056,6 +5056,9 @@ static int ufshcd_slave_alloc(struct scsi_device *sdev)
>  	/* WRITE_SAME command is not supported */
>  	sdev->no_write_same = 1;
>  
> +	/* Use SYNCHRONIZE CACHE instead of FUA to improve performance */
> +	sdev->sdev_bflags = BLIST_BROKEN_FUA;
> +
>  	ufshcd_lu_init(hba, sdev);
>  
>  	ufshcd_setup_links(hba, sdev);
kernel test robot Feb. 2, 2023, 9:01 a.m. UTC | #4
Hi Bart,

I love your patch! Yet something to improve:

[auto build test ERROR on mkp-scsi/for-next]
[also build test ERROR on jejb-scsi/for-next linus/master v6.2-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Bart-Van-Assche/scsi-core-Introduce-the-BLIST_BROKEN_FUA-flag/20230202-021019
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next
patch link:    https://lore.kernel.org/r/20230201180637.2102556-3-bvanassche%40acm.org
patch subject: [PATCH 2/2] scsi: ufs: Use SYNCHRONIZE CACHE instead of FUA
config: x86_64-randconfig-a012-20230130 (https://download.01.org/0day-ci/archive/20230202/202302021626.GfHCwXIh-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/4edde0173805f04faa8e79aab4de3e929ea4b7c0
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Bart-Van-Assche/scsi-core-Introduce-the-BLIST_BROKEN_FUA-flag/20230202-021019
        git checkout 4edde0173805f04faa8e79aab4de3e929ea4b7c0
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> drivers/ufs/core/ufshcd.c:5060:22: error: use of undeclared identifier 'BLIST_BROKEN_FUA'
           sdev->sdev_bflags = BLIST_BROKEN_FUA;
                               ^
   drivers/ufs/core/ufshcd.c:9988:44: warning: shift count >= width of type [-Wshift-count-overflow]
                   if (!dma_set_mask_and_coherent(hba->dev, DMA_BIT_MASK(64)))
                                                            ^~~~~~~~~~~~~~~~
   include/linux/dma-mapping.h:76:54: note: expanded from macro 'DMA_BIT_MASK'
   #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
                                                        ^ ~~~
   1 warning and 1 error generated.


vim +/BLIST_BROKEN_FUA +5060 drivers/ufs/core/ufshcd.c

  5031	
  5032	/**
  5033	 * ufshcd_slave_alloc - handle initial SCSI device configurations
  5034	 * @sdev: pointer to SCSI device
  5035	 *
  5036	 * Returns success
  5037	 */
  5038	static int ufshcd_slave_alloc(struct scsi_device *sdev)
  5039	{
  5040		struct ufs_hba *hba;
  5041	
  5042		hba = shost_priv(sdev->host);
  5043	
  5044		/* Mode sense(6) is not supported by UFS, so use Mode sense(10) */
  5045		sdev->use_10_for_ms = 1;
  5046	
  5047		/* DBD field should be set to 1 in mode sense(10) */
  5048		sdev->set_dbd_for_ms = 1;
  5049	
  5050		/* allow SCSI layer to restart the device in case of errors */
  5051		sdev->allow_restart = 1;
  5052	
  5053		/* REPORT SUPPORTED OPERATION CODES is not supported */
  5054		sdev->no_report_opcodes = 1;
  5055	
  5056		/* WRITE_SAME command is not supported */
  5057		sdev->no_write_same = 1;
  5058	
  5059		/* Use SYNCHRONIZE CACHE instead of FUA to improve performance */
> 5060		sdev->sdev_bflags = BLIST_BROKEN_FUA;
  5061	
  5062		ufshcd_lu_init(hba, sdev);
  5063	
  5064		ufshcd_setup_links(hba, sdev);
  5065	
  5066		return 0;
  5067	}
  5068
Bart Van Assche Feb. 2, 2023, 6:09 p.m. UTC | #5
On 2/1/23 23:52, Adrian Hunter wrote:
> On 1/02/23 20:06, Bart Van Assche wrote:
>> UFS devices perform better when using SYNCHRONIZE CACHE command
>> instead of the FUA flag. Hence this patch.
> 
> It would be nice to get some clarification on what is
> going on for this case.
> 
> This includes with Data Reliability enabled?
> 
> In theory, WRITE+FUA should be at least as fast as
> WRITE+SYNCHRONIZE CACHE, right?
> 
> Do we have any explanation for why that would not
> be true?
> 
> In particular, is SYNCHRONIZE CACHE faster because
> it is not, in fact, providing Reliable Writes?
  Hi Adrian,

Setting the FUA bit in a WRITE command is functionally equivalent to 
submitting a WRITE command without FUA and submitting a SYNCHRONIZE 
CACHE command afterwards. For both sequences the storage device has to 
guarantee that the written data will survive a sudden power loss event.

It is not clear to me why WRITE + SYNCHRONIZE CACHE is faster than WRITE 
+ FUA. All I know is that this behavior has been observed for multiple 
UFS devices from multiple vendors. I hope that one of the UFS vendors 
can provide more information.

Bart.
James Bottomley Feb. 2, 2023, 6:46 p.m. UTC | #6
On Thu, 2023-02-02 at 10:09 -0800, Bart Van Assche wrote:
> On 2/1/23 23:52, Adrian Hunter wrote:
> > On 1/02/23 20:06, Bart Van Assche wrote:
> > > UFS devices perform better when using SYNCHRONIZE CACHE command
> > > instead of the FUA flag. Hence this patch.
> > 
> > It would be nice to get some clarification on what is
> > going on for this case.
> > 
> > This includes with Data Reliability enabled?
> > 
> > In theory, WRITE+FUA should be at least as fast as
> > WRITE+SYNCHRONIZE CACHE, right?
> > 
> > Do we have any explanation for why that would not
> > be true?
> > 
> > In particular, is SYNCHRONIZE CACHE faster because
> > it is not, in fact, providing Reliable Writes?
>   Hi Adrian,
> 
> Setting the FUA bit in a WRITE command is functionally equivalent to 
> submitting a WRITE command without FUA and submitting a SYNCHRONIZE 
> CACHE command afterwards. For both sequences the storage device has
> to guarantee that the written data will survive a sudden power loss
> event.

Well, that may not be true in all situations.  Semantically FUA is a
barrier: it can be implemented such that it destages only the current
write plus the cache writes that occurred before the write with the
FUA.  It could also be implemented as you suggest above, which simply
destages the entire cache, but it doesn't have to be.  One of the
reasons for FUA to exist is the potential difference between the two.

James
Bart Van Assche Feb. 2, 2023, 7 p.m. UTC | #7
On 2/2/23 10:46, James Bottomley wrote:
> Well, that may not be true in all situations.  Semantically FUA is a
> barrier: it can be implemented such that it destages only the current
> write plus the cache writes that occurred before the write with the
> FUA.  It could also be implemented as you suggest above, which simply
> destages the entire cache, but it doesn't have to be.  One of the
> reasons for FUA to exist is the potential difference between the two.

Hi James,

Although support for the barrier concept has been removed from the block 
layer, would it be possible to tell me in which T10 document I can find 
more information about the barrier semantics? All I found in the latest 
SBC-5 draft (revision 4; 2023-01-24) about FUA is the following (section 
5.40 WRITE (10)):

"A force unit access (FUA) bit set to one specifies that the device 
server shall write the logical blocks to:
a) the non-volatile cache, if any; or
b) the medium.
An FUA bit set to zero specifies that the device server shall write the 
logical blocks to:
a) volatile cache, if any;
b) non-volatile cache, if any; or
c) the medium."

To me the description of FUA in the SBC-3 draft from 11 November 2013 
seems identical to the above text.

Thanks,

Bart.
James Bottomley Feb. 2, 2023, 10:13 p.m. UTC | #8
On Thu, 2023-02-02 at 11:00 -0800, Bart Van Assche wrote:
> On 2/2/23 10:46, James Bottomley wrote:
> > Well, that may not be true in all situations.  Semantically FUA is
> > a barrier: it can be implemented such that it destages only the
> > current write plus the cache writes that occurred before the write
> > with the FUA.  It could also be implemented as you suggest above,
> > which simply destages the entire cache, but it doesn't have to be. 
> > One of the reasons for FUA to exist is the potential difference
> > between the two.
> 
> Hi James,
> 
> Although support for the barrier concept has been removed from the
> block layer, would it be possible to tell me in which T10 document I
> can find  more information about the barrier semantics? All I found
> in the latest  SBC-5 draft (revision 4; 2023-01-24) about FUA is the
> following (section 5.40 WRITE (10)):

I have only a vague recollection of manufacturers implementing this
semantic but ...

> "A force unit access (FUA) bit set to one specifies that the device 
> server shall write the logical blocks to:
> a) the non-volatile cache, if any; or
> b) the medium.
> An FUA bit set to zero specifies that the device server shall write
> the logical blocks to:
> a) volatile cache, if any;
> b) non-volatile cache, if any; or
> c) the medium."
> 
> To me the description of FUA in the SBC-3 draft from 11 November 2013
> seems identical to the above text.

So what that says is the FUA write writes to the medium and *doesn't*
flush the volatile cache (so any writeback data stays there).  I assume
this style is for metadata reservations, so we guarantee fs tree
consistency but not necessarily file data consistency.  However, that
makes flushing everything a way bigger hammer than this behaviour.

James
diff mbox series

Patch

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index bf3cb12ef02f..461aa51cfccc 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -5056,6 +5056,9 @@  static int ufshcd_slave_alloc(struct scsi_device *sdev)
 	/* WRITE_SAME command is not supported */
 	sdev->no_write_same = 1;
 
+	/* Use SYNCHRONIZE CACHE instead of FUA to improve performance */
+	sdev->sdev_bflags = BLIST_BROKEN_FUA;
+
 	ufshcd_lu_init(hba, sdev);
 
 	ufshcd_setup_links(hba, sdev);