diff mbox series

net: bypass ->sendpage for slab pages

Message ID 20200819051945.1797088-1-hch@lst.de
State New
Headers show
Series net: bypass ->sendpage for slab pages | expand

Commit Message

Christoph Hellwig Aug. 19, 2020, 5:19 a.m. UTC
Sending Slab or tail pages into ->sendpage will cause really strange
delayed oops.  Prevent it right in the networking code instead of
requiring drivers to guess the exact conditions where sendpage works.

Based on a patch from Coly Li <colyli@suse.de>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/socket.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Coly Li Sept. 18, 2020, 8:37 a.m. UTC | #1
On 2020/8/22 05:14, David Miller wrote:
> From: Christoph Hellwig <hch@lst.de>
> Date: Thu, 20 Aug 2020 06:37:44 +0200
> 
>> If you look at who uses sendpage outside the networking layer itself
>> you see that it is basically block driver and file systems.  These
>> have no way to control what memory they get passed and have to deal
>> with everything someone throws at them.
> 
> I see nvme doing virt_to_page() on several things when it calls into
> kernel_sendpage().
> 
> This is the kind of stuff I want cleaned up, and which your patch
> will not trap nor address.
> 
> In nvme it sometimes seems to check for sendpage validity:
> 
> 		/* can't zcopy slab pages */
> 		if (unlikely(PageSlab(page))) {
> 			ret = sock_no_sendpage(queue->sock, page, offset, len,
> 					flags);
> 		} else {
> 			ret = kernel_sendpage(queue->sock, page, offset, len,
> 					flags);
> 		}
> 
> Yet elsewhere does not and just blindly calls:
> 
> 	ret = kernel_sendpage(queue->sock, virt_to_page(pdu),
> 			offset_in_page(pdu) + req->offset, len,  flags);
> 
> This pdu seems to come from a page frag allocation.
> 
> That's the target side.  On the host side:
> 
> 		ret = kernel_sendpage(cmd->queue->sock, page, cmd->offset,
> 					left, flags);
> 
> No page slab check or anything like that.
> 
> I'm hesitent to put in the kernel_sendpage() patch, becuase it provides a
> disincentive to fix up code like this.
> 

Hi David and Christoph,

It has been quiet for a while, what should we go next for the
kernel_sendpage() related issue ?

Will Christoph's or my series be considered as proper fix, or maybe I
should wait for some other better idea to show up? Any is OK for me,
once the problem is fixed.

Thanks in advance.

Coly Li
Christoph Hellwig Sept. 21, 2020, 2:25 p.m. UTC | #2
On Fri, Sep 18, 2020 at 04:37:24PM +0800, Coly Li wrote:
> Hi David and Christoph,
> 
> It has been quiet for a while, what should we go next for the
> kernel_sendpage() related issue ?
> 
> Will Christoph's or my series be considered as proper fix, or maybe I
> should wait for some other better idea to show up? Any is OK for me,
> once the problem is fixed.

I think for all the network storage stuff we really need a "send me
out a page helper", and the nvmet bits that Dave pointed to look to
me like they actually are currently broken.

Given that Dave doesn't want to change the kernel_sendpage semantics
I'll resend with a new helper instead.  Any preferences for a name?
safe_sendpage?  kernel_sendpage_safe?  kernel_send_one_page?
diff mbox series

Patch

diff --git a/net/socket.c b/net/socket.c
index dbbe8ea7d395da..b4e65688915fe3 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -3638,7 +3638,11 @@  EXPORT_SYMBOL(kernel_getpeername);
 int kernel_sendpage(struct socket *sock, struct page *page, int offset,
 		    size_t size, int flags)
 {
-	if (sock->ops->sendpage)
+	/* sendpage does manipulates the refcount of the sent in page, which
+	 * does not work for Slab pages, or for tails of non-__GFP_COMP
+	 * high order pages.
+	 */
+	if (sock->ops->sendpage && !PageSlab(page) && page_count(page) > 0)
 		return sock->ops->sendpage(sock, page, offset, size, flags);
 
 	return sock_no_sendpage(sock, page, offset, size, flags);