diff mbox series

[v5,02/12] crypto: acomp - Define new interfaces for compress/decompress batching.

Message ID 20241221063119.29140-3-kanchana.p.sridhar@intel.com
State Superseded
Headers show
Series zswap IAA compress batching | expand

Commit Message

Sridhar, Kanchana P Dec. 21, 2024, 6:31 a.m. UTC
This commit adds get_batch_size(), batch_compress() and batch_decompress()
interfaces to:

  struct acomp_alg
  struct crypto_acomp

A crypto_acomp compression algorithm that supports batching of compressions
and decompressions must provide implementations for these API.

A new helper function acomp_has_async_batching() can be invoked to query if
a crypto_acomp has registered these batching interfaces.

A higher level module like zswap can call acomp_has_async_batching() to
detect if the compressor supports batching, and if so, it can call
the new crypto_acomp_batch_size() to detect the maximum batch-size
supported by the compressor. Based on this, zswap can use the minimum of
any zswap-specific upper limits for batch-size and the compressor's max
batch-size, to allocate batching resources.

This allows the iaa_crypto Intel IAA driver to register implementations for
the get_batch_size(), batch_compress() and batch_decompress() acomp_alg
interfaces, that can subsequently be invoked from the kernel zswap/zram
modules to compress/decompress pages in parallel in the IAA hardware
accelerator to improve swapout/swapin performance through these newly added
corresponding crypto_acomp API:

  crypto_acomp_batch_size()
  crypto_acomp_batch_compress()
  crypto_acomp_batch_decompress()

Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
---
 crypto/acompress.c                  |   3 +
 include/crypto/acompress.h          | 111 ++++++++++++++++++++++++++++
 include/crypto/internal/acompress.h |  19 +++++
 3 files changed, 133 insertions(+)

Comments

Herbert Xu Dec. 28, 2024, 11:46 a.m. UTC | #1
On Fri, Dec 20, 2024 at 10:31:09PM -0800, Kanchana P Sridhar wrote:
> This commit adds get_batch_size(), batch_compress() and batch_decompress()
> interfaces to:

First of all we don't need a batch compress/decompress interface
because the whole point of request chaining is to supply the data
in batches.

I'm also against having a get_batch_size because the user should
be supplying as much data as they're comfortable with.  In other
words if the user is happy to give us 8 requests for iaa then it
should be happy to give us 8 requests for every implementation.

The request chaining interface should be such that processing
8 requests is always better than doing 1 request at a time as
the cost is amortised.

Cheers,
Sridhar, Kanchana P Jan. 6, 2025, 5:37 p.m. UTC | #2
Hi Herbert,

> -----Original Message-----
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Sent: Saturday, December 28, 2024 3:46 AM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com;
> chengming.zhou@linux.dev; usamaarif642@gmail.com;
> ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org;
> linux-crypto@vger.kernel.org; davem@davemloft.net; clabbe@baylibre.com;
> ardb@kernel.org; ebiggers@google.com; surenb@google.com; Accardi,
> Kristen C <kristen.c.accardi@intel.com>; Feghali, Wajdi K
> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v5 02/12] crypto: acomp - Define new interfaces for
> compress/decompress batching.
> 
> On Fri, Dec 20, 2024 at 10:31:09PM -0800, Kanchana P Sridhar wrote:
> > This commit adds get_batch_size(), batch_compress() and
> batch_decompress()
> > interfaces to:
> 
> First of all we don't need a batch compress/decompress interface
> because the whole point of request chaining is to supply the data
> in batches.
> 
> I'm also against having a get_batch_size because the user should
> be supplying as much data as they're comfortable with.  In other
> words if the user is happy to give us 8 requests for iaa then it
> should be happy to give us 8 requests for every implementation.
> 
> The request chaining interface should be such that processing
> 8 requests is always better than doing 1 request at a time as
> the cost is amortised.

Thanks for your comments. Can you please elaborate on how
request chaining would enable cost amortization for software
compressors? With the current implementation, a module like
zswap would need to do the following to invoke request chaining
for software compressors (in addition to pushing the chaining
to the user layer for IAA, as per your suggestion on not needing a
batch compress/decompress interface):

zswap_batch_compress():
   for (i = 0; i < nr_pages_in_batch; ++i) {
      /* set up the acomp_req "reqs[i]". */
      [ ... ]
      if (i)
	acomp_request_chain(reqs[i], reqs[0]);
      else
	acomp_reqchain_init(reqs[0], 0, crypto_req_done, crypto_wait);
   }

   /* Process the request chain in series. */
   err = crypto_wait_req(acomp_do_req_chain(reqs[0], crypto_acomp_compress), crypto_wait);

Internally, acomp_do_req_chain() would sequentially process the
request chain by:
1) adding all requests to a list "state"
2) call "crypto_acomp_compress()" for the next list element
3) when this request completes, dequeue it from the list "state"
4) repeat for all requests in "state"
5) When the last request in "state" completes, call "reqs[0]->base.complete()",
    which notifies crypto_wait.
Herbert Xu Jan. 8, 2025, 1:38 a.m. UTC | #3
On Mon, Jan 06, 2025 at 07:10:53PM -0800, Yosry Ahmed wrote:
>
> The main problem is memory usage. Zswap needs a PAGE_SIZE*2-sized
> buffer for each request on each CPU. We preallocate these buffers to
> avoid trying to allocate this much memory in the reclaim path (i.e.
> potentially allocating two pages to reclaim one).

What if we allowed each acomp request to take a whole folio?
That would mean you'd only need to allocate one request per
folio, regardless of how big it is.

Eric, we could do something similar with ahash.  Allow the
user to supply a folio (or scatterlist entry) instead of a
single page, and then cut it up based on a unit-size supplied
by the user (e.g., 512 bytes for sector-based users).  That
would mean just a single request object as long as your input
is a folio or something similar.

Is this something that you could use in fs/verity? You'd still
need to allocate enough memory to store the output hashes.

Cheers,
Yosry Ahmed Jan. 8, 2025, 1:43 a.m. UTC | #4
On Tue, Jan 7, 2025 at 5:39 PM Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> On Mon, Jan 06, 2025 at 07:10:53PM -0800, Yosry Ahmed wrote:
> >
> > The main problem is memory usage. Zswap needs a PAGE_SIZE*2-sized
> > buffer for each request on each CPU. We preallocate these buffers to
> > avoid trying to allocate this much memory in the reclaim path (i.e.
> > potentially allocating two pages to reclaim one).
>
> What if we allowed each acomp request to take a whole folio?
> That would mean you'd only need to allocate one request per
> folio, regardless of how big it is.

Hmm this means we need to allocate a single request instead of N
requests, but the source of overhead is the output buffers not the
requests. We need PAGE_SIZE*2 for each page in the folio in the output
buffer on each CPU. Preallocating this unnecessarily adds up to a lot
of memory.

Did I miss something?

>
> Eric, we could do something similar with ahash.  Allow the
> user to supply a folio (or scatterlist entry) instead of a
> single page, and then cut it up based on a unit-size supplied
> by the user (e.g., 512 bytes for sector-based users).  That
> would mean just a single request object as long as your input
> is a folio or something similar.
>
> Is this something that you could use in fs/verity? You'd still
> need to allocate enough memory to store the output hashes.
>
> Cheers,
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
>
Herbert Xu Feb. 16, 2025, 5:17 a.m. UTC | #5
On Mon, Jan 06, 2025 at 07:10:53PM -0800, Yosry Ahmed wrote:
>
> The main problem is memory usage. Zswap needs a PAGE_SIZE*2-sized
> buffer for each request on each CPU. We preallocate these buffers to
> avoid trying to allocate this much memory in the reclaim path (i.e.
> potentially allocating two pages to reclaim one).

Actually this PAGE_SIZE * 2 thing baffles me.  Why would you
allocate more memory than the input? The comment says that it's
because certain hardware accelerators will disregard the output
buffer length, but surely that's just a bug in the driver?

Which driver does this? We should fix it or remove it if it's
writing output with no regard to the maximum length.

You should only ever need PAGE_SIZE for the output buffer, if
the output exceeds that then just fail the compression.

Cheers,
Yosry Ahmed Feb. 20, 2025, 5:32 p.m. UTC | #6
On Sun, Feb 16, 2025 at 01:17:59PM +0800, Herbert Xu wrote:
> On Mon, Jan 06, 2025 at 07:10:53PM -0800, Yosry Ahmed wrote:
> >
> > The main problem is memory usage. Zswap needs a PAGE_SIZE*2-sized
> > buffer for each request on each CPU. We preallocate these buffers to
> > avoid trying to allocate this much memory in the reclaim path (i.e.
> > potentially allocating two pages to reclaim one).
> 
> Actually this PAGE_SIZE * 2 thing baffles me.  Why would you
> allocate more memory than the input? The comment says that it's
> because certain hardware accelerators will disregard the output
> buffer length, but surely that's just a bug in the driver?
> 
> Which driver does this? We should fix it or remove it if it's
> writing output with no regard to the maximum length.
> 
> You should only ever need PAGE_SIZE for the output buffer, if
> the output exceeds that then just fail the compression.

I agree this should be fixed if it can be. This was discussed before
here:
https://lore.kernel.org/lkml/CAGsJ_4wuTZcGurby9h4PU2DwFaiEKB4bxuycaeyz3bPw3jSX3A@mail.gmail.com/

Barry is the one who brought up why we need PAGE_SIZE*2. Barry, could
you please chime in here?

> 
> Cheers,
> -- 
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
>
Barry Song Feb. 22, 2025, 6:26 a.m. UTC | #7
On Fri, Feb 21, 2025 at 6:32 AM Yosry Ahmed <yosry.ahmed@linux.dev> wrote:
>
> On Sun, Feb 16, 2025 at 01:17:59PM +0800, Herbert Xu wrote:
> > On Mon, Jan 06, 2025 at 07:10:53PM -0800, Yosry Ahmed wrote:
> > >
> > > The main problem is memory usage. Zswap needs a PAGE_SIZE*2-sized
> > > buffer for each request on each CPU. We preallocate these buffers to
> > > avoid trying to allocate this much memory in the reclaim path (i.e.
> > > potentially allocating two pages to reclaim one).
> >
> > Actually this PAGE_SIZE * 2 thing baffles me.  Why would you
> > allocate more memory than the input? The comment says that it's
> > because certain hardware accelerators will disregard the output
> > buffer length, but surely that's just a bug in the driver?
> >
> > Which driver does this? We should fix it or remove it if it's
> > writing output with no regard to the maximum length.
> >
> > You should only ever need PAGE_SIZE for the output buffer, if
> > the output exceeds that then just fail the compression.
>
> I agree this should be fixed if it can be. This was discussed before
> here:
> https://lore.kernel.org/lkml/CAGsJ_4wuTZcGurby9h4PU2DwFaiEKB4bxuycaeyz3bPw3jSX3A@mail.gmail.com/
>
> Barry is the one who brought up why we need PAGE_SIZE*2. Barry, could
> you please chime in here?

I'm not sure if any real hardware driver fails to return -ERRNO, but there could
be another reason why zRAM doesn't want -ERRNO from the previous code
comment:
"When we receive -ERRNO from the compression backend, there's nothing more
we can do":

int zcomp_compress(struct zcomp_strm *zstrm,
                const void *src, unsigned int *dst_len)
{
        /*
         * Our dst memory (zstrm->buffer) is always `2 * PAGE_SIZE' sized
         * because sometimes we can endup having a bigger compressed data
         * due to various reasons: for example compression algorithms tend
         * to add some padding to the compressed buffer. Speaking of padding,
         * comp algorithm `842' pads the compressed length to multiple of 8
         * and returns -ENOSP when the dst memory is not big enough, which
         * is not something that ZRAM wants to see. We can handle the
         * `compressed_size > PAGE_SIZE' case easily in ZRAM, but when we
         * receive -ERRNO from the compressing backend we can't help it
         * anymore. To make `842' happy we need to tell the exact size of
         * the dst buffer, zram_drv will take care of the fact that
         * compressed buffer is too big.
         */
        *dst_len = PAGE_SIZE * 2;

        return crypto_comp_compress(zstrm->tfm,
                        src, PAGE_SIZE,
                        zstrm->buffer, dst_len);
}

After reviewing the zRAM code, I don't see why zram_write_page() needs
to rely on
comp_len to call write_incompressible_page().

zram_write_page()
{
        ret = zcomp_compress(zram->comps[ZRAM_PRIMARY_COMP], zstrm,
                             mem, &comp_len);
        kunmap_local(mem);

        if (unlikely(ret)) {
                zcomp_stream_put(zstrm);
                pr_err("Compression failed! err=%d\n", ret);
                return ret;
        }

        if (comp_len >= huge_class_size) {
                zcomp_stream_put(zstrm);
                return write_incompressible_page(zram, page, index);
        }
}

I mean, why can't we make it as the below:

zram_write_page()
{
        ret = zcomp_compress(zram->comps[ZRAM_PRIMARY_COMP], zstrm,
                             mem, &comp_len);
        kunmap_local(mem);

        if (unlikely(ret && ret != -ENOSP)) {
                zcomp_stream_put(zstrm);
                pr_err("Compression failed! err=%d\n", ret);
                return ret;
        }

        if (comp_len >= huge_class_size || ret) {
                zcomp_stream_put(zstrm);
                return write_incompressible_page(zram, page, index);
        }
}

As long as crypto drivers consistently return -ENOSP or a specific error
code for dst_buf overflow, we should be able to eliminate the
2*PAGE_SIZE buffer.

My point is:
1. All drivers must be capable of handling dst_buf overflow.
2. All drivers must return a consistent and dedicated error code for
dst_buf overflow.

+Minchan, Sergey,
Do you think we can implement this change in zRAM by using PAGE_SIZE instead
of 2 * PAGE_SIZE?

>
> >
> > Cheers,
> > --
> > Email: Herbert Xu <herbert@gondor.apana.org.au>
> > Home Page: http://gondor.apana.org.au/~herbert/
> > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> >

Thanks
Barry
Herbert Xu Feb. 22, 2025, 6:34 a.m. UTC | #8
On Sat, Feb 22, 2025 at 07:26:43PM +1300, Barry Song wrote:
>
> After reviewing the zRAM code, I don't see why zram_write_page() needs
> to rely on
> comp_len to call write_incompressible_page().
> 
> zram_write_page()
> {
>         ret = zcomp_compress(zram->comps[ZRAM_PRIMARY_COMP], zstrm,
>                              mem, &comp_len);
>         kunmap_local(mem);
> 
>         if (unlikely(ret)) {
>                 zcomp_stream_put(zstrm);
>                 pr_err("Compression failed! err=%d\n", ret);
>                 return ret;
>         }
> 
>         if (comp_len >= huge_class_size) {
>                 zcomp_stream_put(zstrm);
>                 return write_incompressible_page(zram, page, index);
>         }
> }

Surely any compression error should just be treated as an
incompressible page?

I mean we might wish to report unusual errors in case the
admin or developer can do something about it, but for the
system as a whole it should still continue as if the page
was simply incompressible.

> As long as crypto drivers consistently return -ENOSP or a specific error
> code for dst_buf overflow, we should be able to eliminate the
> 2*PAGE_SIZE buffer.

Yes we could certainly do that.

Cheers,
Barry Song Feb. 22, 2025, 6:41 a.m. UTC | #9
On Sat, Feb 22, 2025 at 7:34 PM Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> On Sat, Feb 22, 2025 at 07:26:43PM +1300, Barry Song wrote:
> >
> > After reviewing the zRAM code, I don't see why zram_write_page() needs
> > to rely on
> > comp_len to call write_incompressible_page().
> >
> > zram_write_page()
> > {
> >         ret = zcomp_compress(zram->comps[ZRAM_PRIMARY_COMP], zstrm,
> >                              mem, &comp_len);
> >         kunmap_local(mem);
> >
> >         if (unlikely(ret)) {
> >                 zcomp_stream_put(zstrm);
> >                 pr_err("Compression failed! err=%d\n", ret);
> >                 return ret;
> >         }
> >
> >         if (comp_len >= huge_class_size) {
> >                 zcomp_stream_put(zstrm);
> >                 return write_incompressible_page(zram, page, index);
> >         }
> > }
>
> Surely any compression error should just be treated as an
> incompressible page?

probably no, as an incompressible page might become compressible
after changing an algorithm. This is possible, users may swith an
algorithm to compress an incompressible page in the background.

Errors other than dst_buf overflow are a completely different matter
though :-)

>
> I mean we might wish to report unusual errors in case the
> admin or developer can do something about it, but for the
> system as a whole it should still continue as if the page
> was simply incompressible.
>
> > As long as crypto drivers consistently return -ENOSP or a specific error
> > code for dst_buf overflow, we should be able to eliminate the
> > 2*PAGE_SIZE buffer.
>
> Yes we could certainly do that.
>
> Cheers,
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
>

Thanks
barry
Herbert Xu Feb. 22, 2025, 6:52 a.m. UTC | #10
On Sat, Feb 22, 2025 at 07:41:54PM +1300, Barry Song wrote:
>
> probably no, as an incompressible page might become compressible
> after changing an algorithm. This is possible, users may swith an
> algorithm to compress an incompressible page in the background.

I don't understand the difference.  If something is wrong with
the system causing the compression algorithm to fail, shouldn't
zswap just hobble along as if the page was incompressible?

In fact it would be quite reasonble to try to recompress it if
the admin did change the algorithm later because the error may
have been specific to the previous algorithm implementation.

Of course I totally agree that there should be a reporting
mechanism to catch errors that admins/developers should know
about.  But apart from reporting that error there should be
no difference between an inherently incompressible page vs.
buggy algorithm/broken hardware failing to compress the page.

Cheers,
Barry Song Feb. 22, 2025, 7:13 a.m. UTC | #11
On Sat, Feb 22, 2025 at 7:52 PM Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> On Sat, Feb 22, 2025 at 07:41:54PM +1300, Barry Song wrote:
> >
> > probably no, as an incompressible page might become compressible
> > after changing an algorithm. This is possible, users may swith an
> > algorithm to compress an incompressible page in the background.
>
> I don't understand the difference.  If something is wrong with
> the system causing the compression algorithm to fail, shouldn't
> zswap just hobble along as if the page was incompressible?
>
> In fact it would be quite reasonble to try to recompress it if
> the admin did change the algorithm later because the error may
> have been specific to the previous algorithm implementation.
>

Somehow, I find your comment reasonable. Another point I want
to mention is the semantic difference. For example, in a system
with only one algorithm, a dst_buf overflow still means a successful
swap-out. However, other errors actually indicate an I/O failure.
In such cases, vmscan.c will log the relevant error in pageout() to
notify the user.

Anyway, I'm not an authority on this, so I’d like to see comments
from Minchan, Sergey, and Yosry.

> Of course I totally agree that there should be a reporting
> mechanism to catch errors that admins/developers should know
> about.  But apart from reporting that error there should be
> no difference between an inherently incompressible page vs.
> buggy algorithm/broken hardware failing to compress the page.
>
> Cheers,
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
>

Thanks
Barry
Herbert Xu Feb. 22, 2025, 7:22 a.m. UTC | #12
On Sat, Feb 22, 2025 at 08:13:13PM +1300, Barry Song wrote:
>
> Somehow, I find your comment reasonable. Another point I want
> to mention is the semantic difference. For example, in a system
> with only one algorithm, a dst_buf overflow still means a successful
> swap-out. However, other errors actually indicate an I/O failure.
> In such cases, vmscan.c will log the relevant error in pageout() to
> notify the user.

I'm talking specifically about the error from the Crypto API,
not any other error.  So if you werer using some sort of an
offload device to do the compression, that could indeed fail
due to an IO error (perhaps the PCI bus is on fire :)

But because that's reported through the Crypto API, it should
not be treated any differently than an incompressible page,
except for reporting purposes.

Cheers,
Barry Song Feb. 22, 2025, 8:21 a.m. UTC | #13
On Sat, Feb 22, 2025 at 8:23 PM Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> On Sat, Feb 22, 2025 at 08:13:13PM +1300, Barry Song wrote:
> >
> > Somehow, I find your comment reasonable. Another point I want
> > to mention is the semantic difference. For example, in a system
> > with only one algorithm, a dst_buf overflow still means a successful
> > swap-out. However, other errors actually indicate an I/O failure.
> > In such cases, vmscan.c will log the relevant error in pageout() to
> > notify the user.
>
> I'm talking specifically about the error from the Crypto API,
> not any other error.  So if you werer using some sort of an
> offload device to do the compression, that could indeed fail
> due to an IO error (perhaps the PCI bus is on fire :)
>
> But because that's reported through the Crypto API, it should
> not be treated any differently than an incompressible page,
> except for reporting purposes.

I'm referring more to the mm subsystem :-)

Let me provide a concrete example. Below is a small program that will
swap out 16MB of memory to zRAM:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>

#define MB (1024 * 1024)
#define SIZE (16 * MB)

int main() {
    void *addr = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_ANON |
MAP_PRIVATE, -1, 0);
    if (addr == MAP_FAILED) {
        perror("mmap failed");
        return 1;
    }

    for (size_t i = 0; i < SIZE / sizeof(int); i++) {
        ((int*)addr)[i] = rand();
    }

    if (madvise(addr, SIZE, MADV_PAGEOUT) != 0) {
        perror("madvise failed");
        return 1;
    }

    while (1);

    return 0;
}

For errors other than dst_buf overflow, we receive:

/ # ./a.out &
/ # free
               total        used        free      shared  buff/cache   available
Mem:          341228       77036      251872           0       20600      264192
Swap:        2703356           0     2703356
[1]+  Done                       ./a.out

/ # cat /proc/vmstat | grep swp
pswpin 0
pswpout 0
...

No memory has been swapped out, the swap-out counter is zero, and
the swap file is not used at all.

If this is an incompressible page(I mean dst_buf overflow error), there is
no actual issue, and we get the following:

/ #
/ # free
               total        used        free      shared  buff/cache   available
Mem:          341228       92948      236248           0       20372      248280
Swap:        2703356       16384     2686972

/ # cat /proc/vmstat | grep swp
pswpin 0
pswpout 4096
...


>
> Cheers,
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
>

Thanks
Barry
Sergey Senozhatsky Feb. 22, 2025, 12:31 p.m. UTC | #14
On (25/02/22 19:26), Barry Song wrote:
> After reviewing the zRAM code, I don't see why zram_write_page() needs
> to rely on
> comp_len to call write_incompressible_page().

[..]

> zram_write_page()
> {
>         ret = zcomp_compress(zram->comps[ZRAM_PRIMARY_COMP], zstrm,
>                              mem, &comp_len);
>         kunmap_local(mem);
> 
>         if (unlikely(ret && ret != -ENOSP)) {
>                 zcomp_stream_put(zstrm);
>                 pr_err("Compression failed! err=%d\n", ret);
>                 return ret;
>         }
> 
>         if (comp_len >= huge_class_size || ret) {
>                 zcomp_stream_put(zstrm);
>                 return write_incompressible_page(zram, page, index);
>         }
> }

Sorry, I'm slower than usual now, but why should we?  Shouldn't compression
algorithms just never fail, even on 3D videos, because otherwise they won't
be able to validate their Weissman score or something :)

On a serious note - what is the use-case here?  Is the failure here due to
some random "cosmic rays" that taint the  compression H/W?  If so then what
makes us believe that it's uni-directional?  What if it's decompression
that gets busted and then you can't decompress anything previously stored
compressed and stored in zsmalloc.  Wouldn't it be better in this case
to turn the computer off and on again?

The idea behind zram's code is that incompressible pages are not unusual,
they are quite usual, in fact,  It's not necessarily that the data grew
in size after compression, the data is incompressible from zsmalloc PoV.
That is the algorithm wasn't able to compress a PAGE_SIZE buffer to an
object smaller than zsmalloc's huge-class-watermark (around 3600 bytes,
depending on zspage chain size).  That's why we look at the comp-len.
Anything else is an error, perhaps a pretty catastrophic error.

> As long as crypto drivers consistently return -ENOSP or a specific error
> code for dst_buf overflow, we should be able to eliminate the
> 2*PAGE_SIZE buffer.
> 
> My point is:
> 1. All drivers must be capable of handling dst_buf overflow.
> 2. All drivers must return a consistent and dedicated error code for
> dst_buf overflow.

Sorry, where do these rules come from?

> +Minchan, Sergey,
> Do you think we can implement this change in zRAM by using PAGE_SIZE instead
> of 2 * PAGE_SIZE?

Sorry again, what problem are you solving?
Sergey Senozhatsky Feb. 22, 2025, 2:27 p.m. UTC | #15
On (25/02/22 21:31), Sergey Senozhatsky wrote:
> > As long as crypto drivers consistently return -ENOSP or a specific error
> > code for dst_buf overflow, we should be able to eliminate the
> > 2*PAGE_SIZE buffer.
> > 
> > My point is:
> > 1. All drivers must be capable of handling dst_buf overflow.
> > 2. All drivers must return a consistent and dedicated error code for
> > dst_buf overflow.

So I didn't look at all of them, but at least S/W lzo1 doesn't even
have a notion of max-output-len.  lzo1x_1_compress() accepts a pointer
to out_len which tells the size of output stream (the algorithm is free
to produce any), so there is no dst_buf overflow as far as lzo1 is
concerned.  Unless I'm missing something or misunderstanding your points.
Barry Song Feb. 22, 2025, 4:24 p.m. UTC | #16
On Sun, Feb 23, 2025 at 1:31 AM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (25/02/22 19:26), Barry Song wrote:
> > After reviewing the zRAM code, I don't see why zram_write_page() needs
> > to rely on
> > comp_len to call write_incompressible_page().
>
> [..]
>
> > zram_write_page()
> > {
> >         ret = zcomp_compress(zram->comps[ZRAM_PRIMARY_COMP], zstrm,
> >                              mem, &comp_len);
> >         kunmap_local(mem);
> >
> >         if (unlikely(ret && ret != -ENOSP)) {
> >                 zcomp_stream_put(zstrm);
> >                 pr_err("Compression failed! err=%d\n", ret);
> >                 return ret;
> >         }
> >
> >         if (comp_len >= huge_class_size || ret) {
> >                 zcomp_stream_put(zstrm);
> >                 return write_incompressible_page(zram, page, index);
> >         }
> > }
>
> Sorry, I'm slower than usual now, but why should we?  Shouldn't compression
> algorithms just never fail, even on 3D videos, because otherwise they won't
> be able to validate their Weissman score or something :)
>
> On a serious note - what is the use-case here?  Is the failure here due to
> some random "cosmic rays" that taint the  compression H/W?  If so then what
> makes us believe that it's uni-directional?  What if it's decompression
> that gets busted and then you can't decompress anything previously stored
> compressed and stored in zsmalloc.  Wouldn't it be better in this case
> to turn the computer off and on again?
>
> The idea behind zram's code is that incompressible pages are not unusual,
> they are quite usual, in fact,  It's not necessarily that the data grew
> in size after compression, the data is incompressible from zsmalloc PoV.
> That is the algorithm wasn't able to compress a PAGE_SIZE buffer to an
> object smaller than zsmalloc's huge-class-watermark (around 3600 bytes,
> depending on zspage chain size).  That's why we look at the comp-len.
> Anything else is an error, perhaps a pretty catastrophic error.
>
> > As long as crypto drivers consistently return -ENOSP or a specific error
> > code for dst_buf overflow, we should be able to eliminate the
> > 2*PAGE_SIZE buffer.
> >
> > My point is:
> > 1. All drivers must be capable of handling dst_buf overflow.
> > 2. All drivers must return a consistent and dedicated error code for
> > dst_buf overflow.
>
> Sorry, where do these rules come from?
>
> > +Minchan, Sergey,
> > Do you think we can implement this change in zRAM by using PAGE_SIZE instead
> > of 2 * PAGE_SIZE?
>
> Sorry again, what problem are you solving?

The context is that both zswap and zRAM currently use a destination buffer of
2 * PAGE_SIZE instead of just PAGE_SIZE. Herbert, Chenming, and Yosry are
questioning why it hasn't been reduced to a single PAGE_SIZE, and some
attempts have been made to do so.[1][2].

The rules are based on my thoughts on feasibility if we aim to reduce it to a
single PAGE_SIZE.

[1] https://lore.kernel.org/linux-mm/Z7F1B_blIbByYBzz@gondor.apana.org.au/
[2] https://lore.kernel.org/lkml/20231213-zswap-dstmem-v4-1-f228b059dd89@bytedance.com/

Thanks
Barry
Herbert Xu Feb. 23, 2025, 12:14 a.m. UTC | #17
On Sat, Feb 22, 2025 at 11:27:49PM +0900, Sergey Senozhatsky wrote:
>
> So I didn't look at all of them, but at least S/W lzo1 doesn't even
> have a notion of max-output-len.  lzo1x_1_compress() accepts a pointer
> to out_len which tells the size of output stream (the algorithm is free
> to produce any), so there is no dst_buf overflow as far as lzo1 is
> concerned.  Unless I'm missing something or misunderstanding your points.

I just looked at deflate/zstd and they seem to be doing the right
things.

But yes lzo is a gaping security hole on the compression side.

The API has always specified a maximum output length and it needs
to be respected for both compression and decompression.

Cheers,
Herbert Xu Feb. 23, 2025, 12:24 a.m. UTC | #18
On Sat, Feb 22, 2025 at 09:31:41PM +0900, Sergey Senozhatsky wrote:
>
> The idea behind zram's code is that incompressible pages are not unusual,
> they are quite usual, in fact,  It's not necessarily that the data grew
> in size after compression, the data is incompressible from zsmalloc PoV.
> That is the algorithm wasn't able to compress a PAGE_SIZE buffer to an
> object smaller than zsmalloc's huge-class-watermark (around 3600 bytes,
> depending on zspage chain size).  That's why we look at the comp-len.
> Anything else is an error, perhaps a pretty catastrophic error.

If you're rejecting everything above the watermark then you should
simply pass the watermark as the output length to the algorithm so
that it can stop doing useless work once it gets past that point.

> > +Minchan, Sergey,
> > Do you think we can implement this change in zRAM by using PAGE_SIZE instead
> > of 2 * PAGE_SIZE?
> 
> Sorry again, what problem are you solving?

For compression, there is no point in allocating a destination buffer
that is bigger than the original.

Cheers,
Sergey Senozhatsky Feb. 23, 2025, 1:57 a.m. UTC | #19
On (25/02/23 08:24), Herbert Xu wrote:
> On Sat, Feb 22, 2025 at 09:31:41PM +0900, Sergey Senozhatsky wrote:
> >
> > The idea behind zram's code is that incompressible pages are not unusual,
> > they are quite usual, in fact,  It's not necessarily that the data grew
> > in size after compression, the data is incompressible from zsmalloc PoV.
> > That is the algorithm wasn't able to compress a PAGE_SIZE buffer to an
> > object smaller than zsmalloc's huge-class-watermark (around 3600 bytes,
> > depending on zspage chain size).  That's why we look at the comp-len.
> > Anything else is an error, perhaps a pretty catastrophic error.
> 
> If you're rejecting everything above the watermark then you should
> simply pass the watermark as the output length to the algorithm so
> that it can stop doing useless work once it gets past that point.

Makes sense.
Sergey Senozhatsky Feb. 23, 2025, 2:09 a.m. UTC | #20
On (25/02/23 08:14), Herbert Xu wrote:
> On Sat, Feb 22, 2025 at 11:27:49PM +0900, Sergey Senozhatsky wrote:
> >
> > So I didn't look at all of them, but at least S/W lzo1 doesn't even
> > have a notion of max-output-len.  lzo1x_1_compress() accepts a pointer
> > to out_len which tells the size of output stream (the algorithm is free
> > to produce any), so there is no dst_buf overflow as far as lzo1 is
> > concerned.  Unless I'm missing something or misunderstanding your points.
> 
> I just looked at deflate/zstd and they seem to be doing the right
> things.
> 
> But yes lzo is a gaping security hole on the compression side.

Right, for lzo/lzo-rle we need a safety page.

It also seems that there is no common way of reporting dst_but overflow.
Some algos return -ENOSPC immediately, some don't return anything at all,
and deflate does it's own thing - there are these places where they see
they are out of out space but they Z_OK it

if (s->pending != 0) {
	flush_pending(strm);
	if (strm->avail_out == 0) {
		/* Since avail_out is 0, deflate will be called again with
		 * more output space, but possibly with both pending and
		 * avail_in equal to zero. There won't be anything to do,
		 * but this is not an error situation so make sure we
		 * return OK instead of BUF_ERROR at next call of deflate:
		 */
		s->last_flush = -1;
		return Z_OK;
	}
}
Herbert Xu Feb. 23, 2025, 2:52 a.m. UTC | #21
On Sun, Feb 23, 2025 at 11:09:32AM +0900, Sergey Senozhatsky wrote:
>
> Right, for lzo/lzo-rle we need a safety page.

We should fix it because it's a security hole for anyone who calls
it through the Crypto API.

> It also seems that there is no common way of reporting dst_but overflow.
> Some algos return -ENOSPC immediately, some don't return anything at all,
> and deflate does it's own thing - there are these places where they see
> they are out of out space but they Z_OK it
> 
> if (s->pending != 0) {
> 	flush_pending(strm);
> 	if (strm->avail_out == 0) {
> 		/* Since avail_out is 0, deflate will be called again with
> 		 * more output space, but possibly with both pending and
> 		 * avail_in equal to zero. There won't be anything to do,
> 		 * but this is not an error situation so make sure we
> 		 * return OK instead of BUF_ERROR at next call of deflate:
> 		 */
> 		s->last_flush = -1;
> 		return Z_OK;
> 	}
> }

Z_OK is actually an error, see crypto/deflate.c:

	ret = zlib_deflate(stream, Z_FINISH);
	if (ret != Z_STREAM_END) {
		ret = -EINVAL;
		goto out;
	}

We could change this to ENOSPC for consistency.

If you do find anything that returns 0 through the Crypto API please
let me know.

Cheers,
Sergey Senozhatsky Feb. 23, 2025, 3:12 a.m. UTC | #22
On (25/02/23 10:52), Herbert Xu wrote:
> On Sun, Feb 23, 2025 at 11:09:32AM +0900, Sergey Senozhatsky wrote:
> >
> > Right, for lzo/lzo-rle we need a safety page.
> 
> We should fix it because it's a security hole for anyone who calls
> it through the Crypto API.

Yeah, I don't disagree.

> > It also seems that there is no common way of reporting dst_but overflow.
> > Some algos return -ENOSPC immediately, some don't return anything at all,
> > and deflate does it's own thing - there are these places where they see
> > they are out of out space but they Z_OK it
> > 
> > if (s->pending != 0) {
> > 	flush_pending(strm);
> > 	if (strm->avail_out == 0) {
> > 		/* Since avail_out is 0, deflate will be called again with
> > 		 * more output space, but possibly with both pending and
> > 		 * avail_in equal to zero. There won't be anything to do,
> > 		 * but this is not an error situation so make sure we
> > 		 * return OK instead of BUF_ERROR at next call of deflate:
> > 		 */
> > 		s->last_flush = -1;
> > 		return Z_OK;
> > 	}
> > }
> 
> Z_OK is actually an error, see crypto/deflate.c:

I saw Z_STREAM_END, but deflate states "this is not an error" and
there are more places like this.

> 	ret = zlib_deflate(stream, Z_FINISH);
> 	if (ret != Z_STREAM_END) {
> 		ret = -EINVAL;
> 		goto out;
> 	}
> 
> We could change this to ENOSPC for consistency.

So it will ENOSPC all errors, not sure how good that is.  We also
have lz4/lz4hc that return the number of bytes "(((char *)op) - dest)"
if successful and 0 otherwise.  So any error is 0. dst_buf overrun
is also 0, impossible to tell the difference, again not sure if we
can just ENOSPC.
Herbert Xu Feb. 23, 2025, 3:38 a.m. UTC | #23
On Sun, Feb 23, 2025 at 12:12:47PM +0900, Sergey Senozhatsky wrote:
>
> > > It also seems that there is no common way of reporting dst_but overflow.
> > > Some algos return -ENOSPC immediately, some don't return anything at all,
> > > and deflate does it's own thing - there are these places where they see
> > > they are out of out space but they Z_OK it
> > > 
> > > if (s->pending != 0) {
> > > 	flush_pending(strm);
> > > 	if (strm->avail_out == 0) {
> > > 		/* Since avail_out is 0, deflate will be called again with
> > > 		 * more output space, but possibly with both pending and
> > > 		 * avail_in equal to zero. There won't be anything to do,
> > > 		 * but this is not an error situation so make sure we
> > > 		 * return OK instead of BUF_ERROR at next call of deflate:
> > > 		 */
> > > 		s->last_flush = -1;
> > > 		return Z_OK;
> > > 	}
> > > }
> > 
> > Z_OK is actually an error, see crypto/deflate.c:
> 
> I saw Z_STREAM_END, but deflate states "this is not an error" and
> there are more places like this.

That would be a serious bug in deflate.  Where did you see it
return Z_STREAM_END in case of an overrun or error?

> So it will ENOSPC all errors, not sure how good that is.  We also
> have lz4/lz4hc that return the number of bytes "(((char *)op) - dest)"
> if successful and 0 otherwise.  So any error is 0. dst_buf overrun
> is also 0, impossible to tell the difference, again not sure if we
> can just ENOSPC.

I'm talking about the Crypto API calling convention.  Individual
compression libraries obviously have vastly different calling
conventions.

In the Crypto API, lz4 will return -EINVAL:

	int out_len = LZ4_compress_default(src, dst,
		slen, *dlen, ctx);

	if (!out_len)
		return -EINVAL;

Cheers,
Sergey Senozhatsky Feb. 23, 2025, 4:02 a.m. UTC | #24
On (25/02/23 11:38), Herbert Xu wrote:
> On Sun, Feb 23, 2025 at 12:12:47PM +0900, Sergey Senozhatsky wrote:
> >
> > > > It also seems that there is no common way of reporting dst_but overflow.
> > > > Some algos return -ENOSPC immediately, some don't return anything at all,
> > > > and deflate does it's own thing - there are these places where they see
> > > > they are out of out space but they Z_OK it
> > > > 
> > > > if (s->pending != 0) {
> > > > 	flush_pending(strm);
> > > > 	if (strm->avail_out == 0) {
> > > > 		/* Since avail_out is 0, deflate will be called again with
> > > > 		 * more output space, but possibly with both pending and
> > > > 		 * avail_in equal to zero. There won't be anything to do,
> > > > 		 * but this is not an error situation so make sure we
> > > > 		 * return OK instead of BUF_ERROR at next call of deflate:
> > > > 		 */
> > > > 		s->last_flush = -1;
> > > > 		return Z_OK;
> > > > 	}
> > > > }
> > > 
> > > Z_OK is actually an error, see crypto/deflate.c:
> > 
> > I saw Z_STREAM_END, but deflate states "this is not an error" and
> > there are more places like this.
> 
> That would be a serious bug in deflate.  Where did you see it
> return Z_STREAM_END in case of an overrun or error?

Oh, sorry for the confusion, I was talking about Z_OK for overruns.

> > So it will ENOSPC all errors, not sure how good that is.  We also
> > have lz4/lz4hc that return the number of bytes "(((char *)op) - dest)"
> > if successful and 0 otherwise.  So any error is 0. dst_buf overrun
> > is also 0, impossible to tell the difference, again not sure if we
> > can just ENOSPC.
> 
> I'm talking about the Crypto API calling convention.  Individual
> compression libraries obviously have vastly different calling
> conventions.
> 
> In the Crypto API, lz4 will return -EINVAL:
> 
> 	int out_len = LZ4_compress_default(src, dst,
> 		slen, *dlen, ctx);
> 
> 	if (!out_len)
> 		return -EINVAL;

Right, so you said that for deflate it could be

       ret = zlib_deflate(stream, Z_FINISH);
       if (ret != Z_STREAM_END) {
               ret = -ENOSPC;          // and not -EINVAL
               goto out;
       }

if I understood it correctly.  Which would make it: return 0 on success
or -ENOSPC otherwise.  So if crypto API wants consistency and return -ENOSPC
for buffer overruns, then for lz4/lz4hc it also becomes binary: either 0 or
-ENOSCP.  Current -EINVAL return looks better to me, both for deflate and
for lz4/lz4hc.  -ENOSPC is an actionable error code, a user can double the
dst_out size and retry compression etc., while in reality it could be some
SW/HW issue that is misreported as -ENOSPC.



So re-iterating Barry's points:

> My point is:
> 1. All drivers must be capable of handling dst_buf overflow.

Not the case.

> 2. All drivers must return a consistent and dedicated error code for
> dst_buf overflow.

Not the case.
Herbert Xu Feb. 23, 2025, 6:04 a.m. UTC | #25
On Sun, Feb 23, 2025 at 01:02:41PM +0900, Sergey Senozhatsky wrote:
>
> if I understood it correctly.  Which would make it: return 0 on success
> or -ENOSPC otherwise.  So if crypto API wants consistency and return -ENOSPC
> for buffer overruns, then for lz4/lz4hc it also becomes binary: either 0 or
> -ENOSCP.  Current -EINVAL return looks better to me, both for deflate and
> for lz4/lz4hc.  -ENOSPC is an actionable error code, a user can double the
> dst_out size and retry compression etc., while in reality it could be some
> SW/HW issue that is misreported as -ENOSPC.

When you're compressing you're trying to make it smaller.  It's
always better to not compress something rather than doubling the
buffer on ENOSPC.

In any case, no software compression algorithm should ever fail
for a reason other than ENOSPC.

Hardware offload devices can fail of course.

Cheers,
Yosry Ahmed Feb. 24, 2025, 9:49 p.m. UTC | #26
On Sat, Feb 22, 2025 at 08:13:13PM +1300, Barry Song wrote:
> On Sat, Feb 22, 2025 at 7:52 PM Herbert Xu <herbert@gondor.apana.org.au> wrote:
> >
> > On Sat, Feb 22, 2025 at 07:41:54PM +1300, Barry Song wrote:
> > >
> > > probably no, as an incompressible page might become compressible
> > > after changing an algorithm. This is possible, users may swith an
> > > algorithm to compress an incompressible page in the background.
> >
> > I don't understand the difference.  If something is wrong with
> > the system causing the compression algorithm to fail, shouldn't
> > zswap just hobble along as if the page was incompressible?
> >
> > In fact it would be quite reasonble to try to recompress it if
> > the admin did change the algorithm later because the error may
> > have been specific to the previous algorithm implementation.
> >
> 
> Somehow, I find your comment reasonable. Another point I want
> to mention is the semantic difference. For example, in a system
> with only one algorithm, a dst_buf overflow still means a successful
> swap-out. However, other errors actually indicate an I/O failure.
> In such cases, vmscan.c will log the relevant error in pageout() to
> notify the user.
> 
> Anyway, I'm not an authority on this, so I’d like to see comments
> from Minchan, Sergey, and Yosry.
diff mbox series

Patch

diff --git a/crypto/acompress.c b/crypto/acompress.c
index cb6444d09dd7..165559a8b9bd 100644
--- a/crypto/acompress.c
+++ b/crypto/acompress.c
@@ -84,6 +84,9 @@  static int crypto_acomp_init_tfm(struct crypto_tfm *tfm)
 
 	acomp->compress = alg->compress;
 	acomp->decompress = alg->decompress;
+	acomp->get_batch_size = alg->get_batch_size;
+	acomp->batch_compress = alg->batch_compress;
+	acomp->batch_decompress = alg->batch_decompress;
 	acomp->dst_free = alg->dst_free;
 	acomp->reqsize = alg->reqsize;
 
diff --git a/include/crypto/acompress.h b/include/crypto/acompress.h
index eadc24514056..8451ade70fd8 100644
--- a/include/crypto/acompress.h
+++ b/include/crypto/acompress.h
@@ -43,6 +43,10 @@  struct acomp_req {
  *
  * @compress:		Function performs a compress operation
  * @decompress:		Function performs a de-compress operation
+ * @get_batch_size:     Maximum batch-size for batching compress/decompress
+ *                      operations.
+ * @batch_compress:	Function performs a batch compress operation
+ * @batch_decompress:	Function performs a batch decompress operation
  * @dst_free:		Frees destination buffer if allocated inside the
  *			algorithm
  * @reqsize:		Context size for (de)compression requests
@@ -51,6 +55,21 @@  struct acomp_req {
 struct crypto_acomp {
 	int (*compress)(struct acomp_req *req);
 	int (*decompress)(struct acomp_req *req);
+	unsigned int (*get_batch_size)(void);
+	bool (*batch_compress)(struct acomp_req *reqs[],
+			       struct crypto_wait *wait,
+			       struct page *pages[],
+			       u8 *dsts[],
+			       unsigned int dlens[],
+			       int errors[],
+			       int nr_pages);
+	bool (*batch_decompress)(struct acomp_req *reqs[],
+				 struct crypto_wait *wait,
+				 u8 *srcs[],
+				 struct page *pages[],
+				 unsigned int slens[],
+				 int errors[],
+				 int nr_pages);
 	void (*dst_free)(struct scatterlist *dst);
 	unsigned int reqsize;
 	struct crypto_tfm base;
@@ -142,6 +161,13 @@  static inline bool acomp_is_async(struct crypto_acomp *tfm)
 	       CRYPTO_ALG_ASYNC;
 }
 
+static inline bool acomp_has_async_batching(struct crypto_acomp *tfm)
+{
+	return (acomp_is_async(tfm) &&
+		(crypto_comp_alg_common(tfm)->base.cra_flags & CRYPTO_ALG_TYPE_ACOMPRESS) &&
+		tfm->get_batch_size && tfm->batch_compress && tfm->batch_decompress);
+}
+
 static inline struct crypto_acomp *crypto_acomp_reqtfm(struct acomp_req *req)
 {
 	return __crypto_acomp_tfm(req->base.tfm);
@@ -306,4 +332,89 @@  static inline int crypto_acomp_decompress(struct acomp_req *req)
 	return crypto_acomp_reqtfm(req)->decompress(req);
 }
 
+/**
+ * crypto_acomp_batch_size() -- Get the algorithm's batch size
+ *
+ * Function returns the algorithm's batch size for batching operations
+ *
+ * @tfm:	ACOMPRESS tfm handle allocated with crypto_alloc_acomp()
+ *
+ * Return:	crypto_acomp's batch size.
+ */
+static inline unsigned int crypto_acomp_batch_size(struct crypto_acomp *tfm)
+{
+	if (acomp_has_async_batching(tfm))
+		return tfm->get_batch_size();
+
+	return 1;
+}
+
+/**
+ * crypto_acomp_batch_compress() -- Invoke asynchronous compress of
+ *                                  a batch of requests
+ *
+ * Function invokes the asynchronous batch compress operation
+ *
+ * @reqs: @nr_pages asynchronous compress requests.
+ * @wait: crypto_wait for acomp batch compress with synchronous/asynchronous
+ *        request chaining. If NULL, the driver must provide a way to process
+ *        request completions asynchronously.
+ * @pages: Pages to be compressed.
+ * @dsts: Pre-allocated destination buffers to store results of compression.
+ * @dlens: Will contain the compressed lengths.
+ * @errors: zero on successful compression of the corresponding
+ *          req, or error code in case of error.
+ * @nr_pages: The number of pages to be compressed.
+ *
+ * Returns true if all compress requests complete successfully,
+ * false otherwise.
+ */
+static inline bool crypto_acomp_batch_compress(struct acomp_req *reqs[],
+					       struct crypto_wait *wait,
+					       struct page *pages[],
+					       u8 *dsts[],
+					       unsigned int dlens[],
+					       int errors[],
+					       int nr_pages)
+{
+	struct crypto_acomp *tfm = crypto_acomp_reqtfm(reqs[0]);
+
+	return tfm->batch_compress(reqs, wait, pages, dsts,
+				   dlens, errors, nr_pages);
+}
+
+/**
+ * crypto_acomp_batch_decompress() -- Invoke asynchronous decompress of
+ *                                    a batch of requests
+ *
+ * Function invokes the asynchronous batch decompress operation
+ *
+ * @reqs: @nr_pages asynchronous decompress requests.
+ * @wait: crypto_wait for acomp batch decompress with synchronous/asynchronous
+ *        request chaining. If NULL, the driver must provide a way to process
+ *        request completions asynchronously.
+ * @srcs: The src buffers to be decompressed.
+ * @pages: The pages to store the decompressed buffers.
+ * @slens: Compressed lengths of @srcs.
+ * @errors: zero on successful compression of the corresponding
+ *          req, or error code in case of error.
+ * @nr_pages: The number of pages to be decompressed.
+ *
+ * Returns true if all decompress requests complete successfully,
+ * false otherwise.
+ */
+static inline bool crypto_acomp_batch_decompress(struct acomp_req *reqs[],
+						 struct crypto_wait *wait,
+						 u8 *srcs[],
+						 struct page *pages[],
+						 unsigned int slens[],
+						 int errors[],
+						 int nr_pages)
+{
+	struct crypto_acomp *tfm = crypto_acomp_reqtfm(reqs[0]);
+
+	return tfm->batch_decompress(reqs, wait, srcs, pages,
+				     slens, errors, nr_pages);
+}
+
 #endif
diff --git a/include/crypto/internal/acompress.h b/include/crypto/internal/acompress.h
index 53b4ef59b48c..df0e192801ff 100644
--- a/include/crypto/internal/acompress.h
+++ b/include/crypto/internal/acompress.h
@@ -17,6 +17,10 @@ 
  *
  * @compress:	Function performs a compress operation
  * @decompress:	Function performs a de-compress operation
+ * @get_batch_size:     Maximum batch-size for batching compress/decompress
+ *                      operations.
+ * @batch_compress:	Function performs a batch compress operation
+ * @batch_decompress:	Function performs a batch decompress operation
  * @dst_free:	Frees destination buffer if allocated inside the algorithm
  * @init:	Initialize the cryptographic transformation object.
  *		This function is used to initialize the cryptographic
@@ -37,6 +41,21 @@ 
 struct acomp_alg {
 	int (*compress)(struct acomp_req *req);
 	int (*decompress)(struct acomp_req *req);
+	unsigned int (*get_batch_size)(void);
+	bool (*batch_compress)(struct acomp_req *reqs[],
+			       struct crypto_wait *wait,
+			       struct page *pages[],
+			       u8 *dsts[],
+			       unsigned int dlens[],
+			       int errors[],
+			       int nr_pages);
+	bool (*batch_decompress)(struct acomp_req *reqs[],
+				 struct crypto_wait *wait,
+				 u8 *srcs[],
+				 struct page *pages[],
+				 unsigned int slens[],
+				 int errors[],
+				 int nr_pages);
 	void (*dst_free)(struct scatterlist *dst);
 	int (*init)(struct crypto_acomp *tfm);
 	void (*exit)(struct crypto_acomp *tfm);