diff mbox series

[v7] crypto: scompress: remove memcpy if sg_nents is 1 and pages are lowmem

Message ID 20240301192745.14987-1-21cnbao@gmail.com
State New
Headers show
Series [v7] crypto: scompress: remove memcpy if sg_nents is 1 and pages are lowmem | expand

Commit Message

Barry Song March 1, 2024, 7:27 p.m. UTC
From: Barry Song <v-songbaohua@oppo.com>

while sg_nents is 1, which is always true for the current kernel
as the only user - zswap is this case, we might have a chance to
remove memcpy, thus improve the performance.
Though sg_nents is 1, its buffer might cross two pages. If those
pages are highmem, we have no cheap way to map them to contiguous
virtual address because kmap doesn't support more than one page
(kmap single higmem page could be still expensive for tlb) and
vmap is expensive.
So we also test and enure page is not highmem in order to safely
use page_to_virt before removing the memcpy. The good news is
that in the most majority of cases, we are lowmem, and we are
always lowmem in those modern and popular hardware.

Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Tested-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 -v7:
 * fix the problem pointed out by Herbert - flush all pages if dst
   is longer than one page.

 crypto/scompress.c | 36 +++++++++++++++++++++++++++++-------
 1 file changed, 29 insertions(+), 7 deletions(-)

Comments

Herbert Xu March 8, 2024, 11:26 a.m. UTC | #1
On Sat, Mar 02, 2024 at 08:27:45AM +1300, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> while sg_nents is 1, which is always true for the current kernel
> as the only user - zswap is this case, we might have a chance to
> remove memcpy, thus improve the performance.
> Though sg_nents is 1, its buffer might cross two pages. If those
> pages are highmem, we have no cheap way to map them to contiguous
> virtual address because kmap doesn't support more than one page
> (kmap single higmem page could be still expensive for tlb) and
> vmap is expensive.
> So we also test and enure page is not highmem in order to safely
> use page_to_virt before removing the memcpy. The good news is
> that in the most majority of cases, we are lowmem, and we are
> always lowmem in those modern and popular hardware.
> 
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nhat Pham <nphamcs@gmail.com>
> Cc: Yosry Ahmed <yosryahmed@google.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> Tested-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
>  -v7:
>  * fix the problem pointed out by Herbert - flush all pages if dst
>    is longer than one page.
> 
>  crypto/scompress.c | 36 +++++++++++++++++++++++++++++-------
>  1 file changed, 29 insertions(+), 7 deletions(-)

Patch applied.  Thanks.
Guenter Roeck March 17, 2024, 11:13 p.m. UTC | #2
Hi,

On Sat, Mar 02, 2024 at 08:27:45AM +1300, Barry Song wrote:
[ ... ]
> @@ -152,8 +165,17 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
>  			ret = -ENOSPC;
>  			goto out;
>  		}
> -		scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
> -					 1);
> +		if (dst == scratch->dst) {
> +			scatterwalk_map_and_copy(scratch->dst, req->dst, 0,
> +						 req->dlen, 1);
> +		} else {
> +			int nr_pages = DIV_ROUND_UP(req->dst->offset + req->dlen, PAGE_SIZE);
> +			int i;
> +			struct page *dst_page = sg_page(req->dst);
> +
> +			for (i = 0; i < nr_pages; i++)
> +				flush_dcache_page(dst_page + i);

flush_dcache_page() is an empty macro on some architectures
such as xtensa. This results in

Building xtensa:allmodconfig ... failed
--------------
Error log:
crypto/scompress.c: In function 'scomp_acomp_comp_decomp':
crypto/scompress.c:174:38: error: unused variable 'dst_page'

on the affected architectures.

Guenter
Barry Song March 17, 2024, 11:48 p.m. UTC | #3
On Mon, Mar 18, 2024 at 7:13 AM Guenter Roeck <linux@roeck-us.net> wrote:
>
> Hi,
>
> On Sat, Mar 02, 2024 at 08:27:45AM +1300, Barry Song wrote:
> [ ... ]
> > @@ -152,8 +165,17 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> >                       ret = -ENOSPC;
> >                       goto out;
> >               }
> > -             scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
> > -                                      1);
> > +             if (dst == scratch->dst) {
> > +                     scatterwalk_map_and_copy(scratch->dst, req->dst, 0,
> > +                                              req->dlen, 1);
> > +             } else {
> > +                     int nr_pages = DIV_ROUND_UP(req->dst->offset + req->dlen, PAGE_SIZE);
> > +                     int i;
> > +                     struct page *dst_page = sg_page(req->dst);
> > +
> > +                     for (i = 0; i < nr_pages; i++)
> > +                             flush_dcache_page(dst_page + i);
>
> flush_dcache_page() is an empty macro on some architectures
> such as xtensa. This results in

Hi Guenter,

this is a bug of xtensa, could you test the patch:
https://lore.kernel.org/all/20240313045036.51065-1-21cnbao@gmail.com/


>
> Building xtensa:allmodconfig ... failed
> --------------
> Error log:
> crypto/scompress.c: In function 'scomp_acomp_comp_decomp':
> crypto/scompress.c:174:38: error: unused variable 'dst_page'
>
> on the affected architectures.
>
> Guenter

Thanks
Barry
Guenter Roeck March 18, 2024, 12:13 a.m. UTC | #4
On 3/17/24 16:48, Barry Song wrote:
> On Mon, Mar 18, 2024 at 7:13 AM Guenter Roeck <linux@roeck-us.net> wrote:
>>
>> Hi,
>>
>> On Sat, Mar 02, 2024 at 08:27:45AM +1300, Barry Song wrote:
>> [ ... ]
>>> @@ -152,8 +165,17 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
>>>                        ret = -ENOSPC;
>>>                        goto out;
>>>                }
>>> -             scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
>>> -                                      1);
>>> +             if (dst == scratch->dst) {
>>> +                     scatterwalk_map_and_copy(scratch->dst, req->dst, 0,
>>> +                                              req->dlen, 1);
>>> +             } else {
>>> +                     int nr_pages = DIV_ROUND_UP(req->dst->offset + req->dlen, PAGE_SIZE);
>>> +                     int i;
>>> +                     struct page *dst_page = sg_page(req->dst);
>>> +
>>> +                     for (i = 0; i < nr_pages; i++)
>>> +                             flush_dcache_page(dst_page + i);
>>
>> flush_dcache_page() is an empty macro on some architectures
>> such as xtensa. This results in
> 
> Hi Guenter,
> 
> this is a bug of xtensa, could you test the patch:

Thanks for the update.

> https://lore.kernel.org/all/20240313045036.51065-1-21cnbao@gmail.com/
> 

That doesn't build for me.

   CC      arch/xtensa/kernel/asm-offsets.s
In file included from ./include/linux/highmem.h:8,
                  from ./include/linux/bvec.h:10,
                  from ./include/linux/blk_types.h:10,
                  from ./include/linux/writeback.h:13,
                  from ./include/linux/memcontrol.h:23,
                  from ./include/linux/swap.h:9,
                  from ./include/linux/suspend.h:5,
                  from arch/xtensa/kernel/asm-offsets.c:24:
./include/linux/cacheflush.h:9:5: error: "ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE" is not defined, evaluates to 0 [-Werror=undef]
     9 | #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE

A similar works for loongarch, so I don't know what is wrong.
Maybe some context patch is missing.

Guenter
Guenter Roeck March 18, 2024, 12:33 a.m. UTC | #5
On 3/17/24 17:21, Barry Song wrote:
> On Mon, Mar 18, 2024 at 8:14 AM Guenter Roeck <linux@roeck-us.net> wrote:
>>
>> On 3/17/24 16:48, Barry Song wrote:
>>> On Mon, Mar 18, 2024 at 7:13 AM Guenter Roeck <linux@roeck-us.net> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On Sat, Mar 02, 2024 at 08:27:45AM +1300, Barry Song wrote:
>>>> [ ... ]
>>>>> @@ -152,8 +165,17 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
>>>>>                         ret = -ENOSPC;
>>>>>                         goto out;
>>>>>                 }
>>>>> -             scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
>>>>> -                                      1);
>>>>> +             if (dst == scratch->dst) {
>>>>> +                     scatterwalk_map_and_copy(scratch->dst, req->dst, 0,
>>>>> +                                              req->dlen, 1);
>>>>> +             } else {
>>>>> +                     int nr_pages = DIV_ROUND_UP(req->dst->offset + req->dlen, PAGE_SIZE);
>>>>> +                     int i;
>>>>> +                     struct page *dst_page = sg_page(req->dst);
>>>>> +
>>>>> +                     for (i = 0; i < nr_pages; i++)
>>>>> +                             flush_dcache_page(dst_page + i);
>>>>
>>>> flush_dcache_page() is an empty macro on some architectures
>>>> such as xtensa. This results in
>>>
>>> Hi Guenter,
>>>
>>> this is a bug of xtensa, could you test the patch:
>>
>> Thanks for the update.
>>
>>> https://lore.kernel.org/all/20240313045036.51065-1-21cnbao@gmail.com/
>>>
>>
>> That doesn't build for me.
>>
>>     CC      arch/xtensa/kernel/asm-offsets.s
>> In file included from ./include/linux/highmem.h:8,
>>                    from ./include/linux/bvec.h:10,
>>                    from ./include/linux/blk_types.h:10,
>>                    from ./include/linux/writeback.h:13,
>>                    from ./include/linux/memcontrol.h:23,
>>                    from ./include/linux/swap.h:9,
>>                    from ./include/linux/suspend.h:5,
>>                    from arch/xtensa/kernel/asm-offsets.c:24:
>> ./include/linux/cacheflush.h:9:5: error: "ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE" is not defined, evaluates to 0 [-Werror=undef]
>>       9 | #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
>>
>> A similar works for loongarch, so I don't know what is wrong.
>> Maybe some context patch is missing.
> 
> this is weird as include/asm-generic/cacheflush.h will define it to 0
> 
>   #ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
>   static inline void flush_dcache_page(struct page *page)
>   {
>   }
> 
>   #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 0
>   #endif
> 
> Maybe somewhere else also needs to be fixed.
> Can you tell me your toolchain version and toolchain name? and defconfig name?
> 

config is allmodconfig - GCC_PLUGINS.

Toolchains are self-built from gcc source using the buildall system. I tried
gcc 11.4, 12.3, and 13.2. I don't think that matters, though, since
"asm-generic/cacheflush.h" isn't included from arch/xtensa. The diff below
seems to fix the problem, but I have not fully tested it.

Guenter

---
diff --git a/arch/xtensa/include/asm/cacheflush.h b/arch/xtensa/include/asm/cacheflush.h
index 11efdc8eb262..62662919cbbc 100644
--- a/arch/xtensa/include/asm/cacheflush.h
+++ b/arch/xtensa/include/asm/cacheflush.h
@@ -183,4 +183,6 @@ extern void copy_from_user_page(struct vm_area_struct*, struct page*,

  #endif

+#include <asm-generic/cacheflush.h>
+
  #endif /* _XTENSA_CACHEFLUSH_H */
diff mbox series

Patch

diff --git a/crypto/scompress.c b/crypto/scompress.c
index b108a30a7600..60bbb7ea4060 100644
--- a/crypto/scompress.c
+++ b/crypto/scompress.c
@@ -117,6 +117,7 @@  static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
 	struct crypto_scomp *scomp = *tfm_ctx;
 	void **ctx = acomp_request_ctx(req);
 	struct scomp_scratch *scratch;
+	void *src, *dst;
 	unsigned int dlen;
 	int ret;
 
@@ -134,13 +135,25 @@  static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
 	scratch = raw_cpu_ptr(&scomp_scratch);
 	spin_lock(&scratch->lock);
 
-	scatterwalk_map_and_copy(scratch->src, req->src, 0, req->slen, 0);
+	if (sg_nents(req->src) == 1 && !PageHighMem(sg_page(req->src))) {
+		src = page_to_virt(sg_page(req->src)) + req->src->offset;
+	} else {
+		scatterwalk_map_and_copy(scratch->src, req->src, 0,
+					 req->slen, 0);
+		src = scratch->src;
+	}
+
+	if (req->dst && sg_nents(req->dst) == 1 && !PageHighMem(sg_page(req->dst)))
+		dst = page_to_virt(sg_page(req->dst)) + req->dst->offset;
+	else
+		dst = scratch->dst;
+
 	if (dir)
-		ret = crypto_scomp_compress(scomp, scratch->src, req->slen,
-					    scratch->dst, &req->dlen, *ctx);
+		ret = crypto_scomp_compress(scomp, src, req->slen,
+					    dst, &req->dlen, *ctx);
 	else
-		ret = crypto_scomp_decompress(scomp, scratch->src, req->slen,
-					      scratch->dst, &req->dlen, *ctx);
+		ret = crypto_scomp_decompress(scomp, src, req->slen,
+					      dst, &req->dlen, *ctx);
 	if (!ret) {
 		if (!req->dst) {
 			req->dst = sgl_alloc(req->dlen, GFP_ATOMIC, NULL);
@@ -152,8 +165,17 @@  static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
 			ret = -ENOSPC;
 			goto out;
 		}
-		scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
-					 1);
+		if (dst == scratch->dst) {
+			scatterwalk_map_and_copy(scratch->dst, req->dst, 0,
+						 req->dlen, 1);
+		} else {
+			int nr_pages = DIV_ROUND_UP(req->dst->offset + req->dlen, PAGE_SIZE);
+			int i;
+			struct page *dst_page = sg_page(req->dst);
+
+			for (i = 0; i < nr_pages; i++)
+				flush_dcache_page(dst_page + i);
+		}
 	}
 out:
 	spin_unlock(&scratch->lock);