mbox series

[net-next,v2,00/10] crypto, splice, net: Make AF_ALG handle sendmsg(MSG_SPLICE_PAGES)

Message ID 20230530141635.136968-1-dhowells@redhat.com
Headers show
Series crypto, splice, net: Make AF_ALG handle sendmsg(MSG_SPLICE_PAGES) | expand

Message

David Howells May 30, 2023, 2:16 p.m. UTC
Here's the fourth tranche of patches towards providing a MSG_SPLICE_PAGES
internal sendmsg flag that is intended to replace the ->sendpage() op with
calls to sendmsg().  MSG_SPLICE_PAGES is a hint that tells the protocol
that it should splice the pages supplied if it can.

This set consists of the following parts:

 (1) Move netfs_extract_iter_to_sg() to somewhere more general and rename
     it to drop the "netfs" prefix.  We use this to extract directly from
     an iterator into a scatterlist.

 (2) Make AF_ALG use iov_iter_extract_pages().  This has the additional
     effect of pinning pages obtained from userspace rather than taking
     refs on them.  Pages from kernel-backed iterators would not be pinned,
     but AF_ALG isn't really meant for use by kernel services.

 (3) Change AF_ALG still further to use extract_iter_to_sg().

 (4) Make af_alg_sendmsg() support MSG_SPLICE_PAGES support and make
     af_alg_sendpage() just a wrapper around sendmsg().  This has to take
     refs on the pages pinned for the moment.

 (5) Make hash_sendmsg() support MSG_SPLICE_PAGES by simply ignoring it.
     hash_sendpage() is left untouched to be removed later, after the
     splice core has been changed to call sendmsg().

I've pushed the patches here also:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=sendpage-4

David

ver #2)
 - Put the "netfs_" prefix removal first to shorten lines and avoid
   checkpatch 80-char warnings.
 - Fix a couple of spelling mistakes.
 - Wrap some lines at 80 chars.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=51c78a4d532efe9543a4df019ff405f05c6157f6 # part 1
Link: https://lore.kernel.org/r/20230526143104.882842-1-dhowells@redhat.com/ # v1

David Howells (10):
  Drop the netfs_ prefix from netfs_extract_iter_to_sg()
  Fix a couple of spelling mistakes
  Wrap lines at 80
  Move netfs_extract_iter_to_sg() to lib/scatterlist.c
  crypto: af_alg: Pin pages rather than ref'ing if appropriate
  crypto: af_alg: Use extract_iter_to_sg() to create scatterlists
  crypto: af_alg: Indent the loop in af_alg_sendmsg()
  crypto: af_alg: Support MSG_SPLICE_PAGES
  crypto: af_alg: Convert af_alg_sendpage() to use MSG_SPLICE_PAGES
  crypto: af_alg/hash: Support MSG_SPLICE_PAGES

 crypto/af_alg.c         | 185 ++++++++++++---------------
 crypto/algif_aead.c     |  38 +++---
 crypto/algif_hash.c     | 114 +++++++++++------
 crypto/algif_skcipher.c |  10 +-
 fs/cifs/smb2ops.c       |   4 +-
 fs/cifs/smbdirect.c     |   2 +-
 fs/netfs/iterator.c     | 266 ---------------------------------------
 include/crypto/if_alg.h |   7 +-
 include/linux/netfs.h   |   4 -
 include/linux/uio.h     |   5 +
 lib/scatterlist.c       | 269 ++++++++++++++++++++++++++++++++++++++++
 11 files changed, 459 insertions(+), 445 deletions(-)

Comments

Herbert Xu June 6, 2023, 8:43 a.m. UTC | #1
On Tue, May 30, 2023 at 03:16:34PM +0100, David Howells wrote:
>
> -	if (limit > sk->sk_sndbuf)
> -		limit = sk->sk_sndbuf;
> +	/* Don't limit to ALG_MAX_PAGES if the pages are all already pinned. */
> +	if (!user_backed_iter(&msg->msg_iter))
> +		max_pages = INT_MAX;
> +	else
> +		max_pages = min_t(size_t, max_pages,
> +				  DIV_ROUND_UP(sk->sk_sndbuf, PAGE_SIZE));

What's the purpose of relaxing this limit? Even if there is a reason
for this shouldn't this be in a patch by itself?

Thanks,
David Howells June 6, 2023, 9:24 a.m. UTC | #2
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> > -	if (limit > sk->sk_sndbuf)
> > -		limit = sk->sk_sndbuf;
> > +	/* Don't limit to ALG_MAX_PAGES if the pages are all already pinned. */
> > +	if (!user_backed_iter(&msg->msg_iter))
> > +		max_pages = INT_MAX;

If the iov_iter is a kernel-backed type (BVEC, KVEC, XARRAY) then (a) all the
pages it refers to must already be pinned in memory and (b) the caller must
have limited it in some way (splice is limited by the pipe capacity, for
instance).  In which case, it seems pointless taking more than one pass of the
while loop if we can avoid it - at least from the point of view of memory
handling; granted there might be other criteria such as hogging crypto offload
hardware.

> > +	else
> > +		max_pages = min_t(size_t, max_pages,
> > +				  DIV_ROUND_UP(sk->sk_sndbuf, PAGE_SIZE));
> 
> What's the purpose of relaxing this limit?

If the iov_iter is a user-backed type (IOVEC or UBUF) then it's not relaxed.
max_pages is ALG_MAX_PAGES here (actually, I should just move that here so
that it's clearer).

I am, however, applying the sk_sndbuf limit here also - there's no point
extracting more pages than we need to if ALG_MAX_PAGES of whole pages would
overrun the byte limit.

> Even if there is a reason for this shouldn't this be in a patch by itself?

I suppose I could do it as a follow-on patch; use ALG_MAX_PAGES and sk_sndbuf
before that as for user-backed iterators.

Actually, is it worth paying attention to sk_sndbuf for kernel-backed
iterators?

David
Herbert Xu June 6, 2023, 9:30 a.m. UTC | #3
On Tue, Jun 06, 2023 at 10:24:55AM +0100, David Howells wrote:
>
> If the iov_iter is a user-backed type (IOVEC or UBUF) then it's not relaxed.
> max_pages is ALG_MAX_PAGES here (actually, I should just move that here so
> that it's clearer).

Even if it's kernel memory they can't be freed during the hashing
operation, which could be long if the amount is large (or the algo
is slow).

The reason for the limit here is to stop a malicious user from
pinning an unlimited amount of memory by doing a hashing operation,
IOW a DoS attack.

So I think we should keep the limit as is.

Cheers,
David Howells June 6, 2023, 10:08 a.m. UTC | #4
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> So I think we should keep the limit as is.

Okay.

David